Open HansiZeng opened 9 months ago
I think the DPR model is not as good. Here's the run command I used for DPR on TriviaQA:
python ${PYTHONPATH}/dpr/biencoder_trainer.py \
--train_dir /data/KILT/qa/${DS}/dpr_training_data \
--output_dir /data/KILT/qa/${DS}/models/dpr_e3 \
--num_train_epochs 3 \
--num_instances 89273 \
--encoder_gpu_train_limit 16 \
--max_grad_norm 1.0 --learning_rate 5e-5 \
--full_train_batch_size 128
I think the only difference with the default is training for 3 epochs.
When I reproduce the Re2G in the Trivia QA dataset. I couldn't reproduce the results of the generation model in the second stage. In the second stage, the generation model only uses the retrieved passages from the trained DPR, very similar to the KGI paper. I use the provided command for training:
The retrieval metrics (R-Prec, Recall@5) seem close to the KGI model, but the generation metrics (Accuracy, F1, Kilt-AC, Kilt-F1) are far worse than the KGI model.