MediaUncovered / NewsAnalysis

use word embeddings to uncover bias in newspapers
5 stars 1 forks source link

compare and decide for word embedding implementation #9

Open Tilana opened 7 years ago

Tilana commented 7 years ago

based on the literature research about general word embeddings #2 wordRank and word2vec are interesting to investigate and compare. Based on that the way of storing and reading the data (#4) might differ...

Tilana commented 7 years ago

Gensim Word2Vec: http://radimrehurek.com/gensim/models/word2vec.html

Loading data: Gensim only requires that the input must provide sentences sequentially, when iterated over. No need to keep everything in RAM: we can provide one sentence, process it, forget it, load another sentence… https://rare-technologies.com/word2vec-tutorial/

Also data streaming in python: https://rare-technologies.com/data-streaming-in-python-generators-iterators-iterables/

Tilana commented 7 years ago

Tensorflow Word2Vec: https://www.tensorflow.org/tutorials/word2vec no data streaming possible?

Tilana commented 7 years ago

DeepLearning4j Word2Vec: https://deeplearning4j.org/word2vec#just Implementation for Java... SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.

Tilana commented 7 years ago

Shishaohin WordRank: https://bitbucket.org/shihaoji/wordrank With wrapper for Gensim: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WordRank_wrapper_quickstart.ipynb

Tilana commented 6 years ago

http://multithreaded.stitchfix.com/blog/2017/10/18/stop-using-word2vec/

Tilana commented 6 years ago

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.782