Open GoogleCodeExporter opened 8 years ago
The problem was for some documents (such as web documents), there is no
sentence ending punctuation towards the end of the document. These tokens do
not get included. Simple fix is to add the following at the end of the for loop
in the sentence segmenter.
new Sentence(document, prevIdx, document.size - prevIdx)
pallika.
Original comment by pall...@gmail.com
on 18 Apr 2012 at 2:55
Original issue reported on code.google.com by
pall...@gmail.com
on 17 Apr 2012 at 8:29Attachments: