facebookresearch / contriever

Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
Other
673 stars 59 forks source link

To reproduce baseline scores #7

Closed memray closed 2 years ago

memray commented 2 years ago

Hello @gizacard @GitHub30 ,

I wonder if you can share some details about how to reproduce the unsupervised baseline scores, such as the scores in Table 9. Do you just take existing checkpoints and evaluate them on BEIR or do you pretrain them on your own (using same data/settings as training contriever)? I found that I cannot reproduce the same SimCSE scores with original released checkpoint (https://huggingface.co/princeton-nlp/unsup-simcse-roberta-large).

Also for fine-tuning with MSMARCO, is it similar to the supervised SimCSE training?

Thanks again for sharing the resources! Rui

memray commented 2 years ago

@gizacard Also may I know what learning rate scheduling you use in pretraining? Was there any warmup applied? Thanks!

gizacard commented 2 years ago

Hi,

I hope this helps, Gautier

memray commented 2 years ago

Hi @gizacard ,

I appreciate your help. I'm trying to reproduce the unsupervised results. May I ask some questions about the experiment setting?

  1. Do you clip the gradient norm during training?
  2. According to documents of 256 tokens and span sizes sampled between 5% and 50% of the document length, does it mean the min/max length of Q/D is 12/128 tokens?
  3. Do all runs in Sec 6 (Ablation studies) follow the same setting? What is the size of queue used in the subsection Training data?
  4. I wonder what causes the score gap between two unsupervised runs, 50/50% in Table 8 (avg 34.7) vs Table 11 (36.0)? Is there any difference besides the training steps? Does longer training help much?
  5. Do you observe any performance variance of the unsupervised runs? I found changing random seed in data pipeline can significantly affect the ending scores.

Thank you!!! Rui