2U1 / Llama3.2-Vision-Finetune

An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
Apache License 2.0
35 stars 6 forks source link

RuntimeError: The expanded size of the tensor (448) must match the existing size (446) at non-singleton dimension 2. Target sizes: [1, 32, 448, 38424]. Tensor sizes: [1, 1, 446, 38424] #6

Open chuan573906361 opened 5 days ago

chuan573906361 commented 5 days ago

After installation, when I run bash finetune_lora.sh,the error: File "/home/jinchuan/anaconda3/envs/llama3v/lib/python3.10/site-packages/transformers/models/mllama/modeling_mllama.py", line 650, in forward attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: The expanded size of the tensor (448) must match the existing size (446) at non-singleton dimension 2. Target sizes: [1, 32, 448, 38424]. Tensor sizes: [1, 1, 446, 38424] is there any solution ?

2U1 commented 5 days ago

I don't know exactly, but these two can be the reason.

  1. Does your dataset has mixed modality?
  2. Did you follow the dataset format?
chuan573906361 commented 5 days ago

Thanks for the response.I used a multi-images dataset. I think this is the main reason of the error above.And I finetune my dataset successfully using Phi-3.5 vision project.Thanks for your excellent work again.

2U1 commented 4 days ago

@chuan573906361 Thanks for the issue. I'll try to fix it asap.