Open basilevancooten opened 2 years ago
You can find a clean and nice version of the training here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_bi-encoder_margin-mse.py
It will produce a model with similar performance.
Otherwise for the specific model training was done in two iterations: 1) Start with the distilbert-base-uncased model and train with MarginMSE + MultipleNegativesRankingLoss 2) Use the model from 1) and mine hard negatives. Score them all using a cross-encoder 3) Continue training of that model with margin-mse loss and the specific hard negatives.
But the above linked script will produce a model that is on par
thank you :) I'm going to try that out and I'll let you know asap
one last question, just so I'm sure I understood clearly, launching the script with the following arguments:
--model_name distilbert-base-uncased --lr=1e-5 --warmup_steps=10000 --negs_to_use=distilbert-margin_mse-sym_mnrl-mean-v1 --num_negs_per_system=10 --epochs=30 --name=cnt_with_mined_negs_mean --use_pre_trained_model --train_batch_size 64
should amount to a model on par?
No, you can just launch it with the default parameters and the distilbert-base-uncased model
hey there, just to let you know,
I relaunched the script with default parameters and distilbert-base-uncased model
as an original model, I obtained a model that reached 0.356
of MRR for now, which is good enough for now - I'll let you know if I'm able to get 0.37
cheers :)
Same here. I follow the direction and use the default parameters and the distilbert-base-uncased model. However, I can only get 0.354. Any suggestion would be helpful.
Hey there,
My team and I have been really amazed at the latest results that are displayed by the
msmarco-distilbert-dot-v5
(HF card available here) model on the MS MARCO passage dev set, it's quite astonishing!I've been able to use the model for inference and obtained an MRR@10 similar to yours 😄 and the next step for me is to reproduce the training of that model in order to replicate it with a different training set.
Following the script given in the HF model car here, I've stumbled upon two issues:
msmarco-hard-negatives-v6.jsonl.gz
seems to have a been a local file that I can't find exactly in the HF datasets, the closest I found was the msmarco-hard-negatives, that comprises of a score file (cross encoder of a mini lm 6 v2 based model scores for a bunch of (qid, pid) as aDict[int, Dict[int, float]]
) and a mined negatives files as aDict[int, Dict[str, List[int]]
. From what I've figured the training script takes an intermediary mined negatives file that has the cross-encoder scores in the same structure, which would be combination of both aforementioned files, so what I did is I manually combined both to be able to run the script (you can also tell me if I was misguided at this step, but seems ok to me and this point is basically resolved). => ✔️--model final-models/distilbert-margin_mse-sym_mnrl-mean-v1
, and if I understand correctly, this means that the script uses a pretrain distilbert model and I can't seem to find it in the model Hub or anything, is there anyway for you to tell me how to get it? => ❌ 😢Thank you in advance for you precious help,
Peace ☮️ 🤙
PS: also thank you for the amazing work you and your team have done on this library and congratulations on all the research results you've obtained so far.