Open divilian opened 3 years ago
Look specifically for sample code that uses word2vec in a text classification setting.
Is using pre-trained embeddings a good option for us? Pros and cons? Etc.
From what I've been reading, it sounds like computing your own word embeddings (as opposed to using a pre-trained set) is really only viable if you have a great deal of training data. Since we don't (yet), I think we're going to have to use pre-trained. So I want to slightly change what this Issue is (or we can create a new one if you'd rather) to be: "google around for pre-trained word embedding vector data sets that are publicly available, and try to find one that seems appropriate for Reddit comments."
Link to where I found "10 data set" word embeddings https://datasetsearch.research.google.com/search?query=Word%20Embeddings&docid=L2cvMTFqOWMzeDFsMA%3D%3D
Link to glove Twitter embeddings https://www.kaggle.com/jdpaletto/glove-global-vectors-for-word-representation
@vgcagle: add Li paper to Zotero @rockladyeagles: help @vgcagle get gensim installed