我现在使用Mini-InternVL-Chat-2B-V1-5进行finetune了，但用的stage3的配置，请问和1，2的区别在哪里？

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

https://internvl.github.io/

MIT License

3.92k stars 301 forks source link

我现在使用Mini-InternVL-Chat-2B-V1-5进行finetune了，但用的stage3的配置，请问和1，2的区别在哪里？ #254

Open mi4da opened 4 weeks ago

mi4da commented 4 weeks ago

并且使用其finetune发现loss是0.0？？？

mi4da commented 4 weeks ago

{'loss': 0.0, 'learning_rate': 3.3112582781456954e-07, 'epoch': 0.01} 0%| | 3/30200 [01:39<256:54:21, 30.63s/it]warning: The size of tensor a (15968) must match the size of tensor b (52736) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([15968, 2048]), vit_embeds.shape=torch.Size([52736, 2048])

mi4da commented 4 weeks ago

是因为学习率太小吗

mi4da commented 4 weeks ago

感觉是梯度消失

caidou05 commented 3 weeks ago

我用4b模型也是loss 0 你这个解决了吗

Felix0805 commented 1 week ago

我用2b模型也是loss=0

Mohamed-Dhouib commented 1 week ago

Got the same with the 4B model, anyone have an idea on how to resolve this ?

itay1542 commented 1 week ago

I also had this problem and fixed it by setting the correct --conv_style parameter in the training script. in my case I had to change --conv_style to "internlm2-chat"

mi4da commented 5 days ago

我的解决方案是确保图片不要有太过分的长宽比就可，如果有的话就用0 pad到正方形，loss就不为0了，

在 2024-06-13 16:25:10，"caidou05" @.***> 写道：

我用4b模型也是loss 0 你这个解决了吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wintercat1994 commented 1 day ago

是做的用lora的finetune吗？

mi4da commented 33 minutes ago

不是

在 2024-07-05 15:43:18，wintercat1994 @.***> 写道：

是做的用lora的finetune吗？

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>