IBM / multidoc2dial

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
Apache License 2.0
67 stars 22 forks source link

-nq model results and n_docs #16

Closed abbysticha closed 1 year ago

abbysticha commented 1 year ago

Hello, thank you very much for making this baseline code available. I have tried to reproduce the results for the -nq and -ft models from the paper for Task I and Task II. I am getting lower results for F1, EM, and BLEU, especially for the -nq models. I have been reviewing the code to find any bugs that I may have in my own implementation and came across two questions:

  1. for the -nq models do you use dpr-question_encoder-single-nq-base and dpr-ctx_encoder-single-nq-base on huggingface
  2. additionally, the uploaded code uses n_docs = 5 for fine-tuning and n_docs = 10 in retieval, was this what was used in the paper or should I be using the same number of docs (5 or 10) for both?

Thank you again for your help!

sivasankalpp commented 1 year ago

Hi @abbysticha, thank you for your question.

  1. For the -nq models we did use the Facebook DPR models as mentioned here -- https://github.com/facebookresearch/DPR#new-march-2021-retrieval-model. I'm not sure if the huggingface models are the same so can't comment on that. But I would use their script to fetch the models -- https://github.com/facebookresearch/DPR/blob/main/dpr/data/download_data.py
  2. For fine-tuning we used n_docs=5 and during evaluation we used n_docs=10

Hope this helps!