Finetuning learning rate is very small

System Info / 系統信息

I noticed that the learning rate for the instruction tuning phase (1e-7) is 100 times smaller than what was reported in the COG-VLM technical report (1e-5).

https://github.com/THUDM/CogVLM2/blob/57e5a80e996a0e36d9302e9efa3f63cfc29d3368/finetune_demo/peft_lora.py#L185

What is the reason for this? is this due to the LLaMA3? why is COG-VLM2 so much less stable?

++ When you fine-tuned COG-VLM2, did you also only do LoRA?

Best, Orr

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

THUDM / CogVLM2

Finetuning learning rate is very small #94

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现