McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
https://mcgill-nlp.github.io/llm2vec/
MIT License
1.13k stars 83 forks source link

"The MNTP LoRA weights are merged into the base model, and the trainable LoRA weights are initialized with SimCSE weights." #137

Open cultivater opened 1 month ago

cultivater commented 1 month ago

"The MNTP LoRA weights are merged into the base model, and the trainable LoRA weights are initialized with SimCSE weights."


Hi, I saw this in your article, but I didn't find any corresponding configuration in your code. In your supervised contrastive learning (train_configs/supervised/MetaLlama3.json), there are only: "model_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct", "peft_model_name_or_path": "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp"

may I know where is the checkpoint loading of SimCSE weights?

vaibhavad commented 2 weeks ago

Hi @cultivater,

Thanks for your interest in our work. For simplicity, the train configs released correspond to the best performing models, which does not include SimCSE for supervised contrastive learning.

To get "MNTP+SimCSE" as an initialization point, you will need to merge MNTP weights into the base model separately and provide that model checkpoint address in "model_name_or_path". The SimCSE weights will then be specified in "peft_model_name_or_path".

Hope this clarifies your issue. Let me know if you have any further questions.