R-Net
Requirements
There have been a lot of known problems caused by using different software versions. Please check your versions before opening issues or emailing me.
General
- Python >= 3.4
- unzip, wget
Python Packages
- tensorflow-gpu >= 1.5.0
- spaCy >= 2.0.0
- tqdm
- ujson
Usage
To download and preprocess the data, run
# download SQuAD and Glove
sh download.sh
# preprocess the data
python config.py --mode prepro
Hyper parameters are stored in config.py. To debug/train/test the model, run
python config.py --mode debug/train/test
To get the official score, run
python evaluate-v1.1.py ~/data/squad/dev-v1.1.json log/answer/answer.json
The default directory for tensorboard log file is log/event
See release for trained model.
Detailed Implementaion
- The original paper uses additive attention, which consumes lots of memory. This project adopts scaled multiplicative attention presented in Attention Is All You Need.
- This project adopts variational dropout presented in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
- To solve the degradation problem in stacked RNN, outputs of each layer are concatenated to produce the final output.
- When the loss on dev set increases in a certain period, the learning rate is halved.
- During prediction, the project adopts search method presented in Machine Comprehension Using Match-LSTM and Answer Pointer.
- To address efficiency issue, this implementation uses bucketing method (contributed by xiongyifan) and CudnnGRU. The bucketing method can speedup training, but will lower the F1 score by 0.3%.
Performance
Score
EM |
F1 |
original paper |
71.1 |
79.5 |
this project |
71.07 |
79.51 |
Training Time (s/it)
Native |
Native + Bucket |
Cudnn |
Cudnn + Bucket |
E5-2640 |
6.21 |
3.56 |
- |
- |
TITAN X |
2.56 |
1.31 |
0.41 |
0.28 |
Extensions
These settings may increase the score but not used in the model by default. You can turn these settings on in config.py
.