Closed memray closed 2 years ago
@gizacard Also may I know what learning rate scheduling you use in pretraining? Was there any warmup applied? Thanks!
Hi,
We use the model joined with the SimCSE release. The differences likely originates from a discrepancy between the two snapshots of code presented in the SimCSE repo: one using HuggingFace model loading and the other being based on their own code. This leads to the default truncation length and the representation extracted not being similar between the two code snapshots. We used the one using HuggingFace, the default text truncation length was 512 and the representation used was the one after the "pooler", while SimCSE implementation uses 128 as default truncation length and extract the representation before the pooler object for unsupervised models. The latest results reported in our paper on arxiv uses 512, the maximum number of tokens allowed by BERT as truncation length, and the representation before the pooler.
I don't know what is the fine-tuning recipe used on MSMARCO by SimCSE. We use in-batch negative examples, similarly to SimCSE, which is very standard to train bi-encoder on supervised data.
We use 5e-5 for pre-training, with warmup for 20k gradient steps, we have just released training code with hyper-parameters.
I hope this helps, Gautier
Hi @gizacard ,
I appreciate your help. I'm trying to reproduce the unsupervised results. May I ask some questions about the experiment setting?
documents of 256 tokens and span sizes sampled between 5% and 50% of the document length
, does it mean the min/max length of Q/D is 12/128 tokens?Thank you!!! Rui
Hello @gizacard @GitHub30 ,
I wonder if you can share some details about how to reproduce the unsupervised baseline scores, such as the scores in Table 9. Do you just take existing checkpoints and evaluate them on BEIR or do you pretrain them on your own (using same data/settings as training contriever)? I found that I cannot reproduce the same SimCSE scores with original released checkpoint (https://huggingface.co/princeton-nlp/unsup-simcse-roberta-large).
Also for fine-tuning with MSMARCO, is it similar to the supervised SimCSE training?
Thanks again for sharing the resources! Rui