Questions about files generation

facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions

Other

4.48k stars 898 forks source link

Questions about files generation #262

Closed donno2048 closed 3 years ago

donno2048 commented 3 years ago

Not really an issue, I'm simply wondering where was Npz file originated and how was it generated...

newvicklee commented 3 years ago

Are you referring to data/datasets/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz? If so, that file is generated via this script: python scripts/retriever/build_tfidf.py /path/to/doc/db /path/to/output/dir

There's more info on the tf-idf retriever here: https://github.com/facebookresearch/DrQA/tree/master/scripts/retriever#building-the-tf-idf-n-grams

donno2048 commented 3 years ago

Thanks!

donno2048 commented 3 years ago

I'm reopening this issue to ask the same question for the multitask model and the single model, which have been described as:

Model trained only on SQuAD, evaluated in the SQuAD setting

and

Model trained with distant supervision without NER/POS/lemma features, evaluated on multiple datasets (test sets, dev set for SQuAD) in the full Wikipedia setting

In the README, I couldn't figure out how the models have been generated

newvicklee commented 3 years ago

The models are created with this script python scripts/reader/train.py

That script accepts different types of parameters, you can read more in this README: https://github.com/facebookresearch/DrQA/tree/master/scripts/reader

donno2048 commented 3 years ago

Thank you, sorry for the trouble...