'loss': 0.0 and 'grad_norm' remains the same in all steps in task lora finetuning

LLaVA-VL / LLaVA-NeXT

Apache License 2.0

2.95k stars 253 forks source link

'loss': 0.0 and 'grad_norm' remains the same in all steps in task lora finetuning #281

Open Jeremy88888 opened 1 month ago

Jeremy88888 commented 1 month ago

I am training task lora on "liuhaotian/llava-v1.5-13b" by following the same code in LLaVA repo: https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task_lora.sh

The above runs fine in LLaVA repo (https://github.com/haotian-liu/LLaVA/tree/main). When I run it in this LLaVA-NeXT repo (slightly modified few lines of code in train.py to include llava model), the training runs but it keeps showing 'loss': 0.0 and same 'grad_norm". Any idea?

Screenshot_20241006_003928_Slack

refine-counting commented 4 weeks ago

Hi, I got the same issue. Did you manage to resolve this?

paperman-anne commented 4 weeks ago

Hi, I got the same issue. What kind of GPU are you using?

refine-counting commented 4 weeks ago

For me, it was the V100 @paperman-anne running DPO training. How about you?