FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
6.85k stars 493 forks source link

bge-m3微调损失一直为0 #823

Open sevenandseven opened 3 months ago

sevenandseven commented 3 months ago

你好,我在使用以下参数对bge-m3进行微调时:--model_name_or_path /media/ai/HDD/Teamwork/LLM_Embedding_model/Embedding/Embedding/bge-m3 \ --train_data /media/ai/HDD/Teamwork/wangenzhi/FlagEmbedding-master/official/FlagEmbedding/fine_data/m3_data \ --learning_rate 1e-5 \ --fp16 \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 2 \ --dataloader_drop_last True \ --normlized True \ --temperature 0.02 \ --query_max_len 64 \ --passage_max_len 256 \ --train_group_size 1 \ --negatives_cross_device \ --logging_steps 1 \ --logging_strategy steps \ --save_strategy epoch \ --save_total_limit 100 \ --overwrite_output_dir True \ --same_task_within_batch True \ --unified_finetuning True

他的损失函数一直为0,没有任何变化,请问怎么处理。 {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.999267721148214e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.998535442296427e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.99780316344464e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.997070884592854e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.996338605741067e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.99560632688928e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.994874048037494e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.994141769185707e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.99340949033392e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.992677211482133e-06, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.991944932630347e-06, 'epoch': 0.0}

staoxiao commented 3 months ago

--train_group_size 1 means that you don't use any specific negatives (num_neg=train_group_size-1), and just use in-batch-negatives. However, --per_device_train_batch_size 1 means that you just have one training pair in one batch, so there are no available negatives. Therefore, the loss is 0.

You should increase the train_group_size and per_device_train_batch_size

sevenandseven commented 3 months ago

per_device_train_batch_size

ok, understand, thank you very much.