anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Integrated UrduHack and IndicNLP Resources directly into the module #68

Open VarunGumma opened 8 months ago

VarunGumma commented 8 months ago

Integrated UrduHack and indic_nlp_resources directly into the module. This negates the need to install the TensorFlow-based Urdu library which was causing some conflicts. Also, the resources are directly added to this module and we do not need to separately clone it and set the path. This will help in easy installation, and packaging, especially for IT2 HF tokenizer.

VarunGumma commented 1 week ago

Hi @anoopkunchukuttan , as discussed I have opened a PR for the indicnlp version we have been using for IT2 and its tokenizer. This repo integrates UrduHack, indic_nlp_resources and is debloated to support the primary requirements of IT2.Hope this can added directly as another branch to the original repo.

anoopkunchukuttan commented 1 week ago

Thanks @VarunGumma , will review and get back in a couple of days