bigwhoop / sentence-breaker

Sentence boundary disambiguation (SBD) - or sentence breaking - library written in PHP.
MIT License
39 stars 6 forks source link

Failed to find end of quote. Reached end of input. Read: 'er in. #5

Open 4n70w4 opened 4 years ago

4n70w4 commented 4 years ago
Bigwhoop\SentenceBreaker\Lexing\States\StateException  : Failed to find end of quote. Reached end of input. Read: 'er in.

The glass will make a difference too. For some high-browed beer-snobbery about cleaning your beer glasses: http://google.com/

Text:

For more head, well... dump 'er in.\n
\n
The glass will make a difference too. For some high-browed beer-snobbery about cleaning your beer glasses: http://google.com/
4n70w4 commented 4 years ago

Workaround:

Index: debt/vendor/bigwhoop/sentence-breaker/src/Lexing/States/QuotedStringState.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- debt/vendor/bigwhoop/sentence-breaker/src/Lexing/States/QuotedStringState.php   (date 1568577024276)
+++ debt/vendor/bigwhoop/sentence-breaker/src/Lexing/States/QuotedStringState.php   (date 1568577024276)
@@ -12,6 +12,7 @@

 use Bigwhoop\SentenceBreaker\Lexing\Lexer;
 use Bigwhoop\SentenceBreaker\Lexing\Tokens\QuotedStringToken;
+use Bigwhoop\SentenceBreaker\Lexing\Tokens\WordToken;

 class QuotedStringState extends State
 {
@@ -23,11 +24,16 @@
     protected function call(Lexer $lexer)
     {
         $start = $lexer->next();
+        $pos = $lexer->pos();

         while (true) {
             $next = $lexer->next();

             if ($next === null) {
+                $lexer->peek($lexer->pos() - $pos);
+                $lexer->emit(new WordToken());
+
+                return new WordState();
                 throw new StateException('Failed to find end of quote. Reached end of input. Read: '.$lexer->getTokenValue());
             }
jamesgraham commented 4 years ago

Is it worth creating a PR @4n70w4 ?

4n70w4 commented 4 years ago

@jamesgraham I tested it only on my cases. Perhaps I have not checked all cases and something may break.