WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
https://vip-llava.github.io/
Apache License 2.0
214 stars 15 forks source link

[Issue] Mismatch in tensor dimensions between 'input_ids' and 'output_ids' at the specified dimension. #18

Open tarekdawey opened 2 weeks ago

tarekdawey commented 2 weeks ago

Describe the issue

Hi , Thanks for your great work. I am trying to run the quick start as specified in the repository but facing this issue:

Traceback (most recent call last): File "/media/local/tdawey/ViP-LLaVA/quick_start.py", line 17, in eval_model(args) File "/media/local/tdawey/ViP-LLaVA/llava/eval/run_llava.py", line 128, in eval_model n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item() RuntimeError: The size of tensor a (53) must match the size of tensor b (16) at non-singleton dimension 1

Could you please look into this and explain why this is happening and help me resolve it?

mu-cai commented 2 weeks ago

Hi taredawey, this is due to the transformers version mismatch.

I have updated run_llava.py, can you pull and test again?

tarekdawey commented 2 weeks ago

Thanks a lot for your prompt response! It works now. I really appreciate you help.