Closed djokester closed 3 years ago
For the word vectors we will use sentences from HDTB initially. Once @MansiBreja is done with HindiMonoCorp Extraction we can use the sentences from HindiMonoCorp too and train the word vectors again. Also we might need to scrape more sentences for this.
The (word, gender) tuple is currently available here In accordance with Issue #9 we will move this file to Sangita Data We will also create a new repository for Hindi Word Vectors and one for machine learning models. These will be referenced in a separate issue.
Along with this we will remove the dependencies for Scikit Learn and work only with Keras. The task list is given below
[ ] Move the gender.py to Sangita Data - Cakewalk.
[ ] Create a fresh set of word vectors and store it under a new repository especially for word vectors. - Pro.
[ ] Train the word vectors against the gender tags, and store the model under a separate repository. - Intermediate.
[ ] Refactor the code here, to accommodate these changes. - Intermediate.