anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

Open varunkatiyar819 opened 2 months ago

varunkatiyar819 commented 2 months ago

Previously the sentence was not able to tokenize the sentence, if the sentence ends with numeric character which i guess was a logic issue for checking and not splitting for decimal number. I have changed logic a bit, specifically for not tokenizing in that condition. So for the sentence - "India was declared a nation with its own constitution on 26 January 1950, while India gained independence on 14 August 1947. About 3 years went through the formation of the nation and the complete departure of the British." it's working fine also tested a few edge cases too.