anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Better tokenization of numbers needed #40

Open anoopkunchukuttan opened 3 years ago

anoopkunchukuttan commented 3 years ago

४,३२,००० get tokenized as ४ , ३२ , ०००. This should not happen.

prassr commented 5 months ago

Colab link for testing