What values are you using for w1, w2 and w3 when defining loss

SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

https://arxiv.org/abs/2309.12871

MIT License

454 stars 32 forks source link

What values are you using for w1, w2 and w3 when defining loss #24

Closed mengyao00 closed 9 months ago

mengyao00 commented 9 months ago

Hello, I am wondering what constant values you were using for fine-tuning, the loss is L = w1 ∗ Lcos + w2 ∗ Libn + w3 ∗ Langle, but I did not find the values of w1, w2 and w3 in your paper.

SeanLee97 commented 9 months ago

w1 and w3 can be set to 1.0. As for w2, you can search its value from [0.5, 1.0, 35.0].

SeanLee97 commented 9 months ago

Here is our training script for model SeanLee97/angle-llama-7b-nli-v2. We set w2=35.0 in this model.

CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--model_name NousResearch/Llama-2-7b-hf \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 500 --batch_size 120 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1

mengyao00 commented 9 months ago

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

SeanLee97 commented 9 months ago

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

35 is better