caskcsg / ir

ConTextual Mask Auto-Encoder for Dense Passage Retrieval
Apache License 2.0
35 stars 3 forks source link

Reproducing CoT-MAE results on NQ #4

Open zhengmq2010 opened 1 year ago

zhengmq2010 commented 1 year ago

Hi, I want to reproduce the results on NQ, but I don't see hyperparameter setting in paper or repo. Do I just need to follow the script msmarco/eval_msmarco.sh and modify the augments to reproduce? Can you provide more instruction? It would be great if you can provide the model weight of retrievers on two stages when finetuning on NQ. Thanks in advance!

ma787639046 commented 1 year ago

Sorry for the late reply. The issue raised in our organization repo does not give me a notification.

Following Condenser, we use DPR to train and test on NQ.

Two-stage pipelines are used:

Stage 1: Train with BM25 negatives.

Stage 2: Train with BM25 negatives + hard negatives mined from CoT-MAE stage 1 retriever.

You can also refer to here(https://github.com/texttron/tevatron/blob/main/examples/coCondenser-nq/README.md) for pipeline instructions, but we use the hard negatives mined from CoT-MAE stage 1 retriever, rather than the negatives provided by coCondenser.