anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Sentence tokenizer creating issue while splitting for end of the sentence. #71

Open varunkatiyar819 opened 2 months ago

varunkatiyar819 commented 2 months ago

Sentence tokenizer is not able to split the sentence if the sentence ends with some number (e.g for sentence - India was declared a nation with its own constitution on 26 January 1950, while India gained independence on 14 August 1947. About 3 years went through the formation of the nation and the complete departure of the British.) sentence tokenizer was not able to split after "14 August 1947." and the sentence remains a one single sentence instead of tokenizing it.