Closed divyeshlad18 closed 4 years ago
Thanks for the kind words, Divyesh!
The problem with making splitting on single-line-breaks possible is that it is a bit dangerous- in many cases, the break can be in the middle of a sentence, while if there is an empty newline in between, it certainly would be a paragraph break.
May I suggest you simply add a post-processor that spits the output once more on all single newlines, too? I guess that would be a simple lambda, if that really were all you needed.
On Tue, Aug 18, 2020, at 13:58, Divyesh Lad wrote:
Hello Florian,
Thank you for developing such a powerful NLP library. I gotta say, I have tried all the NLP libraries for sentence tokenization and none of them even comes closer to your creation.
I was just wondering is there a feature/parameter in segmenter.analysis which splits the sentences on the occurrence of single "\n".
Example: 1\n\nDissonance\n\nTuesday, February 2\nBoone Drake awoke before sunup with little recollection of the previous two days.
Right now the output is:
**1
Dissonance
Tuesday, February 2\nBoone Drake awoke before sunup with little recollection of the previous two days.**
Is there a way I can also split on single "\n", like this:
**1
Dissonance
Tuesday, February
Boone Drake awoke before sunup with little recollection of the previous two days.**
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fnl/syntok/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2XPIUW5YGY4H4KZP6NK3SBJUFJANCNFSM4QDK4B7Q.
Thank you for the quick response, Florian.
It does make sense to not break the sentence from the middle just due to a single newline occurrence.
Also, Thanks to your suggestion, I'm going to add a post-processor script to tokenize the sentence on "\n" detection.
Hello Florian,
Thank you for developing such a powerful NLP library. I gotta say, I have tried all the NLP libraries for sentence tokenization and none of them even comes closer to your creation.
I was just wondering is there a feature/parameter in segmenter.analysis which splits the sentences on the occurrence of single "\n".
Example:
1\n\nDissonance\n\nTuesday, February 2\nBoone Drake awoke before sunup with little recollection of the previous two days.
Right now the output is:
1
Dissonance
Tuesday, February 2\nBoone Drake awoke before sunup with little recollection of the previous two days.
Is there a way I can also split on single "\n", like this:
1
Dissonance
Tuesday, February
Boone Drake awoke before sunup with little recollection of the previous two days.