dlwh / epic

**Archived** Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
http://scalanlp.org/
Apache License 2.0
469 stars 82 forks source link

Remove while space only sentences in NewLineSentenceSegmenter #59

Open hiroshinoji opened 7 years ago

hiroshinoji commented 7 years ago

NewLineSentenceSegmenter did not trim each segmented sentence, so for example, it always outputted an error:

$ echo I live in Osaka . | java -Xmx4g -cp assembly.jar epic.parser.ParseText --model parsers/SpanModel-300.parser --sentences newline --tokens whitespace
(TOP (S (NP (PRP He) ) (VP (VBZ lives)  (PP (IN in)  (NP (NNP Osaka) )))))
### Could not tag Vector(), because No parse for Vector(): infinite partition... epic.parser.projections.ChartProjector$class.project(ChartProjector.scala:36);epic.parser.projections.AnchoredRuleMarginalProjector.project(EnumeratedAnchoring.scala:78)

I added an filter for empty sentences as in MLSentenceSegmenter, which avoids this by trimming every sentence. Now no error is outputted.