Open sevenandseven opened 5 months ago
I think this loss curve is normal. You need to further smooth this curve to observe its trend. Besides, you can set --report_to tensorboard
to save the loss by tensorboard, and then use tensorboard tool to show the loss curve.
I think this loss curve is normal. You need to further smooth this curve to observe its trend. Besides, you can set
--report_to tensorboard
to save the loss by tensorboard, and then use tensorboard tool to show the loss curve.
ok,thanks for you reply. i would like to ask, during the fine-tuning process, besides adjusting hyperparameters, are there any other better methods to improve the fine-tuning effect?
I think this loss curve is normal. You need to further smooth this curve to observe its trend. Besides, you can set
--report_to tensorboard
to save the loss by tensorboard, and then use tensorboard tool to show the loss curve.ok,thanks for you reply. i would like to ask, during the fine-tuning process, besides adjusting hyperparameters, are there any other better methods to improve the fine-tuning effect?
The most important thing is the quality of data. Firstly, You need to ensure that the positive samples are highly relevant. Besides, You can use this script to mine hard negatives: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives, and change the argument range_for_sampling
to adjust the hardness of negatives.
adjust the hardness of negatives
ok, thanks for you reply.
①Overall, my loss function seems to be trending correctly, but sometimes the loss suddenly increases significantly. What could be the reasons for these sudden and severe increases?”
②When I adjust the command to the following command, it starts unable to train. What could be the reason for this situation, and how can I fix it? this is command: CUDA_VISIBLE_DEVICES=0,5 torchrun --standalone --nnodes=1 --nproc_per_node 2 -m FlagEmbedding.baai_general_embedding.finetune.run \ --output_dir ./results/v3.0/bge_small_zhv15_1epoch_noise5 \ --model_name_or_path /media/ai/HDD/Teamwork/LLM_Embedding_model/Embedding/Embedding/bge-small-zh-v1.5 \ --train_data /media/ai/HDD/Teamwork/wangenzhi/FlagEmbedding-master/official/FlagEmbedding/fine_data/datav1/query_answer_v23-minedHN-new-0407-30neg.jsonl \ --learning_rate 1e-5 \ --fp16 \ --num_train_epochs 2 \ --per_device_train_batch_size 64 \ --gradient_accumulation_steps 256 \ --dataloader_drop_last True \ --normlized True \ --temperature 0.02 \ --query_max_len 64 \ --passage_max_len 256 \ --train_group_size 1 \ --logging_steps 1 \ --logging_strategy steps \ --save_steps 100 \ --save_strategy steps \ --save_total_limit 10 \ --overwrite_output_dir true \ --report_to wandb
this is loss plot:
你好,我在微调embedding和reranker,模型的过程中发现,两种模型微调时损失i函数都呈现上下震荡的情况,训练1个epoch,损失函数无法收敛,我用了9100条数据,请问这个情况是什么原因,我应该怎么解决?