IBM / multidoc2dial

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
Apache License 2.0
67 stars 22 forks source link

Performance on BM25 retrieval baseline #3

Closed sutakori closed 2 years ago

sutakori commented 2 years ago

I am running run_eval_rag_re.sh on BM25 baseline and seeing a much high result on retrieval results,

4168it [08:02,  9.60it/s]INFO:__main__:Using BM25 for retrieval
4176it [08:02,  9.88it/s]INFO:__main__:Using BM25 for retrieval
4184it [08:03,  8.91it/s]INFO:__main__:Using BM25 for retrieval
4192it [08:04, 10.06it/s]INFO:__main__:Using BM25 for retrieval
4201it [08:05,  8.65it/s]
INFO:__main__:Using BM25 for retrieval
INFO:__main__:Doc_Prec@1:  43.18
INFO:__main__:Doc_Prec@5:  67.20
INFO:__main__:Doc_Prec@10:  74.53
INFO:__main__:Pid_Prec@5:  19.45
INFO:__main__:Pid_Prec@5:  40.75
INFO:__main__:Pid_Prec@10:  48.56
INFO:__main__:all:  43.18 &  67.20 &  74.53  &  19.45 &  40.75 &  48.56 &

Settings: domain=all seg=token score=original task=grounding split=val

Additional parameters: --bm25 ../data/mdd_kb/mdd-$seg-$domain.csv

Input files are generated by predecessor scripts with same settings. Datas are generated by run_data_preprocessing.sh. Index files are generated by run_kb_index.sh. Checkpoints are generated by run_finetune_rag.sh, with DPR checkpoints generated by run_converter.sh on finetuned DPR checkpoints. (And if I am not mistaken, although required by the code, RAG checkpoints will not affect the results of run_eval_rag_re.sh with bm25 given).

So any mistake in my usage or understanding?

By the way, I am a bit confusing on the grounding span generation task (Table 4) in the paper. Does it correspond to the result of run_eval_rag_re.sh? But it dosen't contain F1, EM and BL. And does the D^token-rr-cls-ft means joint training of DPR question encoder and RAG generator, while D^token-ft use finetuned DPR directly? I would be appreciated if you could clarify my confusions.

songfeng commented 2 years ago

Thank you for the questions!

sutakori commented 2 years ago

Thanks for your reply! So if I am not mistaken, Table 5 is from run_eval_rag_re.sh, and Table 4&6 are from run_eval_rag_e2e.sh, with task set as grounding&generation, is that right? I mistakenly thought D^token-ft as DPR and D^token-rr-cls-ft as RAG, and so they are all RAG? And I am still confusing of the difference between D^token-ft and the *-rr-*.

songfeng commented 2 years ago
sutakori commented 2 years ago

Ok, I've got it, thank you for your prompt reply!