Closed kolk closed 6 years ago
Hi,
A couple of things.
--regex
flag to the evaluation.--candidate-file data/dataset/WikiMovies-entities.txt
.I trained a model on SQuAD using the spaCy tokenizer. Here are my results:
scripts/reader/train.py --tune-partial 1000 --use-pos f --use-ner f --use-lemma f --train-file SQuAD-v1.1-train-processed-spacy.txt --dev-file SQuAD-v1.1-dev-processed-spacy.txt
This gets EM = 68.0
and F1 = 77.5
(trained with CoreNLP it will get 68.4/78.1
).
Running on CuratedTREC:
python scripts/pipeline/predict.py data/datasets/CuratedTrec-test.txt --reader-model /tmp/drqa-models/20180529-d3caf05f.mdl --embedding-file data/embeddings/glove.840B.300d.txt --tokenizer spacy
python scripts/pipeline/eval.py data/datasets/CuratedTrec-test.txt /tmp/CuratedTrec-test-20180529-d3caf05f-pipeline.preds --regex
--------------------------------------------------
Dataset: data/datasets/CuratedTrec-test.txt
Predictions: /tmp/CuratedTrec-test-20180529-d3caf05f-pipeline.preds
{'exact_match': 20.605187319884728}
Running on WikiMovies:
python scripts/pipeline/predict.py data/datasets/WikiMovies-test.txt --reader-model /tmp/drqa-models/20180529-d3caf05f.mdl --embedding-file data/embeddings/glove.840B.300d.txt --candidate-file data/datasets/WikiMovies-entities.txt --tokenizer spacy
python scripts/pipeline/eval.py data/datasets/WikiMovies-test.txt /tmp/WikiMovies-test-20180529-d3caf05f-pipeline.preds
--------------------------------------------------
Dataset: data/datasets/WikiMovies-test.txt
Predictions: /tmp/WikiMovies-test-20180529-d3caf05f-pipeline.preds
{'exact_match': 24.035369774919616}
Thank you for the quick reply. The --regex
parameter gave the expected results. With both --candidate
and --regex
the exact_result score for CuratedTrec is 19.3083573487032 and for WikiMovies is 24.3870578778135.
Hi, I tried to reproduce the Exact match accuracy for CuratedTrec(19.7%) and WikiMovies(24.5%) as listed in Table 6 of the paper. The accuracy I get on single model trained on SQUAD and tested on CuratedTrec is 5.04% and on WikiMovies is 6.37%. The steps I followed are as follows:
python scripts/pipleline/predict.py data/datasets/CuratedTrec-test.txt --out-dir out_pipeline/ --reader-model models/squad/20180528-9275f860.mdl --retriever-model data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --doc-db data/wikipedia/docs.db --embedding-file data/embeddings/glove.840B.300d.txt --tokenize spacy --batch-size 3
python scripts/pipeline/eval.py data/datasets/CuratedTrec-test.txt out_pipeline/CuratedTrec-test-20180528-9275f860-pipeline.preds
The pipeline module with default batch size of 128 did not fit in 62GB RAM, so I have used a batch size of 3. Is there a mistake in my understanding of the steps to be followed for reproducing the output? Please help.