This repository contains Keras, PyTorch and NumPy implementations of some deep learning architectures for NLP. For a quick theoretical intro about Deep Learning for NLP, I encourage you to have a look at my notes.
https://github.com/Tixierae/deep_learning_NLP/tree/master/NMT
A compact, fully functional, and well-commented PyTorch implementation of the classical seq2seq model "Effective Approaches to Attention-based Neural Machine Translation" (Luong et al. 2015), with support for the three global attention mechanisms presented in subsection 3.1 of the paper: (1) dot, (2) general, and (3) concat, and also stacking vs non-stacking RNN encoder and decoder, and bidirectional vs unidirectional RNN encoder.
We experiment on a toy English -> French dataset from http://www.manythings.org/anki/, originally extracted from the Tatoeba project, with 136,521 sentence pairs for training and 34,130 pairs for testing.
https://github.com/Tixierae/deep_learning_NLP/blob/master/skipgram/sg_d2v_numpy.ipynb
In this notebook, we learn word and document vectors completely by hand on the IMDB movie review dataset, with just a for
loop and NumPy! We implement the following models:
We also:
A RAM-friendly implementation of the model introduced by Yang et al. (2016), with step-by-step explanations and links to relevant resources: https://github.com/Tixierae/deep_learning_NLP/blob/master/HAN/HAN_final.ipynb
In my experiments on the Amazon review dataset (3,650,000 documents, 5 classes), I reach 62.6% accuracy after 8 epochs, and 63.6% accuracy (the accuracy reported in the paper) after 42 epochs. Each epoch takes about 20 mins on my TitanX GPU. I deployed the model as a web app. As shown in the image below, you can paste your own review and visualize how the model pays attention to words and sentences.
The notebook makes use of the following concepts:
1There is more and more evidence that adaptive optimizers like Adam, Adagrad, etc. converge faster but generalize poorly compared to SGD-based approaches. For example: Wilson et al. (2018), this blogpost. Traditional SGD is very slow, but a cyclical learning rate schedule can bring a significant speedup, and even sometimes allow to reach better performance.
An implementation of (Kim 2014)'s 1D Convolutional Neural Network for short text classification: https://github.com/Tixierae/deep_learning_NLP/blob/master/CNN_IMDB/cnn_imdb.ipynb
Agreed, this is not for NLP. But an implementation can be found here https://github.com/Tixierae/deep_learning_NLP/blob/master/CNN_MNIST/mnist_cnn.py. I reach 99.45% accuracy on MNIST with it.
This notebook provides simple functions to clean and index documents, and to execute word and phrase queries. It also shows how to compute TF-IDF coefficients. https://github.com/Tixierae/deep_learning_NLP/blob/master/other/inverted_index_tfidf.ipynb
If you use some of the code in this repository in your work, please cite
@article{tixier2018notes,
title={Notes on Deep Learning for NLP},
author={Tixier, Antoine J.-P.},
journal={arXiv preprint arXiv:1808.09772},
year={2018}
}
Tixier, A. J. P. (2018). Notes on Deep Learning for NLP. arXiv preprint arXiv:1808.09772.