Closed mengyao00 closed 9 months ago
w1 and w3 can be set to 1.0
. As for w2, you can search its value from [0.5, 1.0, 35.0].
Here is our training script for model SeanLee97/angle-llama-7b-nli-v2. We set w2=35.0
in this model.
CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--model_name NousResearch/Llama-2-7b-hf \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 500 --batch_size 120 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1
Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?
Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?
35 is better
Hello, I am wondering what constant values you were using for fine-tuning, the loss is L = w1 ∗ Lcos + w2 ∗ Libn + w3 ∗ Langle, but I did not find the values of w1, w2 and w3 in your paper.