goete111 / factorie

Automatically exported from code.google.com/p/factorie
0 stars 0 forks source link

cc.factorie.app.nlp.segment.SentenceSegmenter omits the last few sentences in a document #24

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. run cc.factorie.app.nlp.segment.SentenceSermenter.process on a document
2.
3.

What is the expected output? What do you see instead?

On some documents, the last few sentences are omitted from document.sentences. 

What version of the product are you using? On what operating system?
I checked out factorie on 04/05/12

Please provide any additional information below.

I'm attaching a sample file that shows this behavior. 

Original issue reported on code.google.com by pall...@gmail.com on 17 Apr 2012 at 8:29

Attachments:

GoogleCodeExporter commented 8 years ago
The problem was for some documents (such as web documents), there is no 
sentence ending punctuation towards the end of the document. These tokens do 
not get included. Simple fix is to add the following at the end of the for loop 
in the sentence segmenter. 

new Sentence(document, prevIdx, document.size - prevIdx) 

pallika.

Original comment by pall...@gmail.com on 18 Apr 2012 at 2:55