Code for Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension.
We provide our components, which include a clean PyTorch implementation of Latent Predictor Networks, an NER answer chunker, and a procedure to finetune a BIDAF model on a combination of synthetic and real question, answer pairs.
We provide the pre-generated synthetic question answer pairs and a pretrained SQuAD BiDAF model that can be trained on NewsQA data.
The question-generation network takes in a passage, extracts answer spans from the passage, and for each answer span, generates a question. In our work, we use the question generator network to finetune a Reading Comprehension Model trained on SQUAD to answer questions on NewsQA.
Finally, we also provide several logs from our experiments for single-model, two-model results, and gold answer finetuning under logs/results.
Note: If git LFS doesn't work, please download the "git-lfsed" version of the repository here: https://drive.google.com/file/d/1OZrW7ZojGZS9P9J7G_dT-gsPazZlY9Sl/view?usp=sharing.
Git LFS
Python 3.5+
Tensorflow version 0.12
NumPy
NLTK
CUDA
tqdmc
Python 2.7
tqdm
unidecode
textblob
cd bidaf
./download.sh
python3 -m newsqa.prepro
To install necessary dependencies, please run
cd ../bidaf
./install.sh
To get remaining datasets, please run
git lfs pull origin master
For a preliminary example of how to extract answers (currently NER), generate questions, and then finetune a BIDAF model on the data, see
./scripts.sh.
lines 1-70.
For an example of how to finetune a BiDAF model trained on SQuAD on NewsQA using our old logs, please follow the instructions in
./scripts.sh
lines 70-onward and from the bidaf directory run
# Now run training with squad and old dataset
python3 -m basic.cli \
--run_id 22 \
--shared_path out/basic/06/shared.json \
--load_path out/basic/06/save/basic-40000 \
--sup_unsup_ratio 5 \
--load_ema False --gpu_idx 0 \
--mode train --data_dir newsqa_unsupervised_old \
--len_opt --batch_size 24 --num_steps 14000 \
--eval_period 1000 --save_period 1000 \
--sent_size_th 800 --para_size_th 800
for i in 41 42 43 44 45 46 47 48 49 50 51 52 53 54;
do
python3 -m basic.cli \
--run_id 22 \
--shared_path out/basic/06/shared.json \
--load_path "out/basic/22/save/basic-"$i"000" \
--k 10 \
--use_special_token False \
--load_ema False --gpu_idx 3 \
--mode test --data_dir newsqa \
--len_opt --batch_size 15 --num_steps 40000 \
--eval_period 1000 --save_period 1000 \
--sent_size_th 2100 --para_size_th 2100
done
eargs=""
model_id=22
for num in 41 42 43 44 45 46 47 48 49 51; do
eval_path="out/basic/${model_id}/eval/test-0${num}000.pklz"
eargs="$eargs $eval_path"
done
python3 -m basic.ensemble --data_path newsqa/data_test.json --shared_path newsqa/shared_test.json -o new_results_30.json $eargs
python3 newsqa/evaluate.py newsqa/data_test.json new_results_30.json
(running this command on my machine, this gives approximately ~30.5 EM and 44.5 F1 performance).
To run several of our logs, please execute:
cd logs/results
bash script.sh
For an end-to-end example of how to train your own question generator network, run
$ python3 -m tests.language_model_trainer_test
For an end-to-end example of how to train your own answer chunking model, run
$ python3 -m tests.iob_trainer_test
A pre-trained BIDAF SQuAD model can be found at bidaf/out/basic/06/save/* Synthetic question, answer pair datasets can be found at bidaf/newsqa_unsupervised_old (better performance) and bidaf/newsqa_unsupervised_old_verb_filtered (worse performance)
Question Generation Please note, to use a question generation network on SQuAD to generate questions on NewsQA, you must first create an inputs.txt file which corresponds to the paragraphs in CNN/Daily Mail. For legal reasons we can't provide it as part of the repository. To create them, please run
cd bidaf && python3 -m tests.create_generation_dataset_unsupervised
cd ../
cp datasets/newsqa_unsupervised/train/inputs.txt datasets/{NEWSQA_DATASET_OF_YOUR_CHOICE}/train/inputs.txt
datasets Contains sample datasets used to train the model. C.f. datasets/question_generation. Each dataset needs to have a vocab.txt file, inputs.txt, outputs.txt etc.
data_loaders Contains code to load a dataset from a directory into memory, and generate batches of examples to train/validate a model.
models Contains core code for the question generator network (language_model.py), IOB tagging model (iob/iob_model.py), and trainer (language_trainer.py)
tests Contains unit tests for loading, training, predicting the network, and other components of the stack. newsqa_predictor/ contains tests for predicting on newsqa. squad_predictor/ contains test for predicting on squad.
helpers Contains various utilities for loading, saving, things that make pytorch easier to work with.
dnn_units Contains the core LSTM units for encoding/decoding.
trainers Contains a trainer for training the answer chunker model.
bidaf Contains the necessary code for training a reading comprehension model. This code is heavily based on the Bi-directional Attention Flow for Machine Comprehension repository (thanks to authors for releasing their code!)