OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.82k stars 543 forks source link

Question about full-parameter finetuning #282

Open dydxdt opened 2 weeks ago

dydxdt commented 2 weeks ago

Thx for your great work! I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!

10000 step: (corresponding to red line, ignore blue line) image

~5 epoch: 企业微信截图_e7e709a1-4fab-4034-a4b1-3591945db1dc

1SingleFeng commented 2 weeks ago

请问你使用了多少资源进行全参数微调,我使用2张v100和4张v100均不行

todaydeath commented 5 days ago

我在全参微调的时候,控制台不打印loss信息,这个有遇到过么

LDLINGLINGLING commented 16 hours ago

请问你使用了多少资源进行全参数微调,我使用2张v100和4张v100均不行

全量微调的话可能要8张v100

LDLINGLINGLING commented 16 hours ago

Thx for your great work!感谢您的出色工作! I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?我有一个关于训练参数的问题,即参数 max_steps=10000 是否适合全参数微调?

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!我使用自己的训练数据集进行全参数微调,数据集大约有 240,000 个数据。使用默认训练设置训练模型后,我看到训练日志显示:“epoch:0.32”,这意味着它使用了训练数据的 1/3 数据。我的训练数据集包含 3 个不同的任务(caption、ocr,...)。然后我使用 num_train_epochs(=5,与 qwen 相同)而不是 max_steps 进行训练,但我发现在我的标题测试集上测试时,具有 5 个 epochs 的模型比具有 10000 步的模型性能更差。损失似乎很正常。那么,您能为这种情况提供一些建议吗?感谢!

10000 step: (corresponding to red line, ignore blue line)10000步:(对应红线,忽略蓝线) image

~5 epoch: ~5 纪元: 企业微信截图_e7e709a1-4fab-4034-a4b1-3591945db1dc

The picture you gave only shows the train loss, which cannot reflect the real effect, so it is meaningless to only look at the loss.