anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Wrong sentence tokenization of sentences with quotes #37

Open GokulNC opened 3 years ago

GokulNC commented 3 years ago

Example:

>>> sentence_tokenize.sentence_split('He said "Will you bring me some water?". She said "Sure!", and went away.', lang='en')
['He said "Will you bring me some water? ". She said "Sure!',
'", and went away.']

The correct output should have been:

['He said "Will you bring me some water?".',
'She said "Sure!", and went away.']