LucidAi / nlcd

News Life Cycle Detector
MIT License
3 stars 1 forks source link

Bad sentence segmentation if text contains quotes. #6

Closed zaycev closed 10 years ago

zaycev commented 10 years ago

Issue occurs if text contains quoted sentences, for example:

"\"I feel like this is a wonderful step toward a healthier world and I'm so glad Vermont is the first to take it.\" Maine and Connecticut have previously passed laws requiring labels on GMO foods, but their laws don't take effect unless neighboring states follow suit.

After extraction of the quoted segments we have:

["I feel like this is a wonderful step toward a healthier world and I'm so glad Vermont is the first to take it."]

After applying sentence segmentation to the original text a extract the first sentence again:


["\"I feel like this is a wonderful step toward a healthier world and I'm so glad Vermont is the first to take it.", "\" Maine and Connecticut have previously passed laws requiring labels on GMO foods, but their laws don't take effect unless neighboring states follow suit."]

Final sentences list:

["I feel like this is a wonderful step toward a healthier world and I'm so glad Vermont is the first to take it.", "\"I feel like this is a wonderful step toward a healthier world and I'm so glad Vermont is the first to take it.", "\" Maine and Connecticut have previously passed laws requiring labels on GMO foods, but their laws don't take effect unless neighboring states follow suit."]
zaycev commented 10 years ago

Duplicates issue #5 . Closed.