Jaccard Similarity
Different embeddings+ K-means
Different embeddings+ Cosine Similarity
Word2Vec + Smooth Inverse Frequency + Cosine Similarity
Different embeddings+LSI + Cosine Similarity
Different embeddings+ LDA + Jensen-Shannon distance
Different embeddings+ Word Mover Distance
Different embeddings+ Variational Auto Encoder (VAE)
Different embeddings+ Universal sentence encoder
Different embeddings+ Siamese Manhattan LSTM
Knowledge-based Measures
[Incredible !!!] : https://github.com/nlptown/nlp-notebooks
[Tutorial WMD in jupyter notebook] : https://github.com/makcedward/nlp/blob/master/sample/nlp-word_mover_distance.ipynb
[Word Mover Distance] : https://www.kaggle.com/ankitswarnkar/word-embedding-using-glove-vector
[lstm-gru-sentiment-analysis] : https://github.com/javaidnabi31/Word-Embeddding-Sentiment-Classification
https://towardsdatascience.com/elmo-contextual-language-embedding-335de2268604
[Learning Word Embedding (Mathematics)] https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html
[A Beginner’s Aha Moments for Word2Vec] : https://yidatao.github.io/2017-08-03/word2vec-aha/
[Glove, Word2Vec, Fastext classes] : https://github.com/makcedward/nlp/blob/master/sample/nlp-word_embedding.ipynb
[!!! Very nice tutorial about how word2vec works] : https://towardsdatascience.com/word2vec-made-easy-139a31a4b8ae
[!!! An implementation guide to Word2Vec using NumPy and Google Sheets] : https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
[WordRank embedding: “crowned” is most similar to “king”, not word2vec’s “Canute”] : https://rare-technologies.com/wordrank-embedding-crowned-is-most-similar-to-king-not-word2vecs-canute/
[Nice !!! ]: https://github.com/JacopoMangiavacchi/SwiftNLC/tree/master/ModelNotebooks
[Tutorial from ENSAE] : http://www.xavierdupre.fr/app/papierstat/helpsphinx/notebooks/text_sentiment_wordvec.html#les-donnees
[CoreML with GloVe Word Embedding and Recursive Neural Network - nice tutorial] : https://medium.com/@JMangia/coreml-with-glove-word-embedding-and-recursive-neural-network-part-2-ab238ca90970
[Big Benchmark] : http://nlp.town/blog/sentence-similarity/
InferSent (INF) = pre-trained encoder that was developed by Facebook Research. It is a BiLSTM with max pooling, trained on the SNLI dataset, 570k English sentence pairs labelled with one of three categories: entailment, contradiction or neutral.
GSE (Google Sentence Encoder) = Google’s answer to Facebook’s InferSent. It comes in two forms:
====-> work with Pearson correlation
[How to predict Quora Question Pairs using Siamese Manhattan LSTM] : https://medium.com/mlreview/implementing-malstm-on-kaggles-quora-question-pairs-competition-8b31b0b16a07
[Latent Semantic Indexing (LSI) - An Example with mathematics] : http://www1.se.cuhk.edu.hk/~seem5680/lecture/LSI-Eg.pdf
[Finding similar documents with Word2Vec and WMD] : https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html
[Cosine Similarity] : https://www.machinelearningplus.com/nlp/cosine-similarity/
[Tutorial on LSI] : http://poloclub.gatech.edu/cse6242/2018spring/slides/CSE6242-820-TextAlgorithms.pdf
http://robotics.stanford.edu/~scohen/research/emdg/emdg.html#flow_eqw_notopt
http://robotics.stanford.edu/~rubner/slides/sld014.htm
http://jxieeducation.com/2016-06-13/Document-Similarity-With-Word-Movers-Distance/
[Beyond Cosinu > Jensen-Shannon + Hypothesis Test] : http://stefansavev.com/blog/beyond-cosine-similarity/
[Great ressources with MANY MANY notebooks] : https://www.renom.jp/index.html?c=tutorial
[LE TRANSPORT OPTIMAL, COUTEAU SUISSE POUR LA DATA SCIENCE]: https://weave.eu/le-transport-optimal-un-couteau-suisse-pour-la-data-science/
[BEST TUTORIAL variational-autoencoders] : https://www.jeremyjordan.me/variational-autoencoders/
[BEST TUTORIAL : Earthmover Distance !!!!!] : https://jeremykun.com/2018/03/05/earthmover-distance/
Problem: Compute distance between points with uncertain locations (given by samples, or differing observations, or clusters).
[Introduction to Wasserstein metric (earth mover’s distance) -> Mathematics]: https://yoo2080.wordpress.com/2015/04/09/introduction-to-wasserstein-metric-earth-movers-distance/
[Word Mover’s distance calculation between word pairs of two documents] : https://stats.stackexchange.com/questions/303050/word-movers-distance-calculation-between-word-pairs-of-two-documents
[WMD + Word2Vec] : https://github.com/stephenhky/PyWMD/blob/master/WordMoverDistanceDemo.ipynb
[Books about Optimal Transport] : https://optimaltransport.github.io/pdf/ComputationalOT.pdf
[NICE !!!!!! How Autoencoders work - Understanding the math and implementation] : https://www.kaggle.com/shivamb/how-autoencoders-work-intro-and-usecases
[Word2Vec to convert each question into a semantic vector then stack a Siamese network to detect if the pair is duplicate] : http://www.erogol.com/duplicate-question-detection-deep-learning/
[Amazing !!!] : https://github.com/makcedward/nlp
Text Representation:
1. Traditional Method
2. Character Level
3. Word Level
4. Sentence Level