anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Better tokenization of numbers needed #40: Resolved #69

Open prassr opened 5 months ago

prassr commented 5 months ago

Tokenizes the Devnagri and Gujrati numbers which appear as comma separated values in amounts. Takes care of dates with - Also, it takes care of numbers which appear at the beginning of the string. The earlier code was splitting the numbers , dates when they appear at the beginning of the sentence on a newline.

40