Large twitter datasets for Telugu and Hindi

AI4Bharat / indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages

https://ai4bharat.github.io/indicnlp_catalog

560 stars 80 forks source link

Large twitter datasets for Telugu and Hindi #4

Closed bedapudi6788 closed 5 years ago

bedapudi6788 commented 5 years ago

Hi, https://github.com/bedapudi6788/LOIT in this repo I added large twitter datasets for telugu (7.9 million) and hindi (17.6 million) and fasttext skipgram and cbow word vectors for the same.

anoopkunchukuttan commented 5 years ago

Thanks Praneeth, I have added the record to the catalog under monolingual corpora.