SangitaNLP / sangita

A Natural Language Toolkit for Indian Languages
Apache License 2.0
40 stars 41 forks source link

Improve the Accuracy of The Gender Taggger #10

Closed djokester closed 3 years ago

djokester commented 6 years ago

The (word, gender) tuple is currently available here In accordance with Issue #9 we will move this file to Sangita Data We will also create a new repository for Hindi Word Vectors and one for machine learning models. These will be referenced in a separate issue.
Along with this we will remove the dependencies for Scikit Learn and work only with Keras. The task list is given below

djokester commented 6 years ago

For the word vectors we will use sentences from HDTB initially. Once @MansiBreja is done with HindiMonoCorp Extraction we can use the sentences from HindiMonoCorp too and train the word vectors again. Also we might need to scrape more sentences for this.