Instead of the google-news trained corpus of data we are now going to train our own model using LiveJournal (LJ). Gene will give me (Dennis) the corpus of bag of words model of posts and the I will make the corresponding masterDict. After making this new masterDict of DMat (counts) and IMat (indexes), we estimate that this new dictionary will still be ~900K words. Then after having this new matrix then we can make a matrix of index of words in sentences x length of the new dictionary. Then the process is same as before.
Summary:
Make 2 new masterDicts from (/var/local/destress/text_sents and /var/local/destress/text_sents_ids)
Make a new matrix m = length of master dict n = # of sentences.
Word2Vec: @geneyoo @peparedes
Instead of the google-news trained corpus of data we are now going to train our own model using LiveJournal (LJ). Gene will give me (Dennis) the corpus of bag of words model of posts and the I will make the corresponding masterDict. After making this new masterDict of DMat (counts) and IMat (indexes), we estimate that this new dictionary will still be ~900K words. Then after having this new matrix then we can make a matrix of index of words in sentences x length of the new dictionary. Then the process is same as before.
Summary: