Closed keien closed 10 years ago
What exactly do you mean?
It technically shouldn't happen because we split sentences beforehand, so it's probably NLTK and CoreNLP splitting sentences differently.
What input text does it happen with?
First one is: "I am looking specifically for american guys, love the accent, etc..."
Second one is: "WE COULD DO DADDY/DAUGHTER, STEP DAUGHTER, NEICE, ETC.."
Yes, NLTK considers both of those to be one sentence whereas CoreNLP thinks they are each two. We could simply concatenate sentences if CoreNLP returns multiple when NLTK returns one, unless we should stick to the CoreNLP sentence division.
fixed in 1db753c691d856e6b01ed47c471b06208249db0b but we should add it to the error message screen in the UI
That's what line 71 is doing.
Okay, then it'll be fine
The common factor seems to be an etc. followed by a period. Here are the anomalies: