Open cultivater opened 1 month ago
Hi @cultivater,
Thanks for your interest in our work. For simplicity, the train configs released correspond to the best performing models, which does not include SimCSE for supervised contrastive learning.
To get "MNTP+SimCSE" as an initialization point, you will need to merge MNTP weights into the base model separately and provide that model checkpoint address in "model_name_or_path". The SimCSE weights will then be specified in "peft_model_name_or_path".
Hope this clarifies your issue. Let me know if you have any further questions.
"The MNTP LoRA weights are merged into the base model, and the trainable LoRA weights are initialized with SimCSE weights."
Hi, I saw this in your article, but I didn't find any corresponding configuration in your code. In your supervised contrastive learning (train_configs/supervised/MetaLlama3.json), there are only: "model_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct", "peft_model_name_or_path": "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp"
may I know where is the checkpoint loading of SimCSE weights?