McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
https://mcgill-nlp.github.io/llm2vec/
MIT License
1.17k stars 88 forks source link

MNTP learning rate #76

Closed spookyQubit closed 4 months ago

spookyQubit commented 4 months ago

Great work. Really appreciate that the code is public (and also the paper is so clearly written).

I had a question about the MNTP config. Can you please confirm what learning rate was used when running run_mntp.py for llama2/llama3? Looking at the configs under train_configs, it seems that the default learning rate in TrainingArguments of 5e-5 was used?

Sorry if this is already in the paper and I missed it (I read appendix D.1.1 but I am not sure where to look at the details for same training parameters as RoBERTa MNTP training).

-- Thanks.

vaibhavad commented 4 months ago

Hi @spookyQubit, thanks for your interest in our work.

We modified the huggingface mlm training script. Similar to this script, we used the default learning rate in TrainingArguments which is 5e-5.

spookyQubit commented 4 months ago

Thanks @vaibhavad .