OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.91k stars 840 forks source link

RuntimeError: shape mismatch: value tensor of shape [1037] cannot be broadcast to indexing result of shape [1036] #116

Closed hujunchao closed 3 months ago

hujunchao commented 3 months ago

File "lib/python3.9/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 190, in forward position_ids[batch_idx][p_attn_mask.view(-1).cpu()] = pos_ids RuntimeError: shape mismatch: value tensor of shape [1037] cannot be broadcast to indexing result of shape [1036]

iceflame89 commented 3 months ago

Please give the running code and image

Cuiunbo commented 3 months ago

This issue does not provide a reproducible context and requires more information to help resolve it. If you still need assistance, please provide your code environment and running code to help us reproduce the issue

Pancat007 commented 2 weeks ago

same issue here.

Error occurred when executing MiniCPM_VQA:

shape mismatch: value tensor of shape [1104] cannot be broadcast to indexing result of shape [1035]

File "D:\StableDiffusion\ComfyUI-aki-v1.3\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\StableDiffusion\ComfyUI-aki-v1.3\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\StableDiffusion\ComfyUI-aki-v1.3\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "D:\StableDiffusion\ComfyUI-aki-v1.3\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) File "D:\StableDiffusion\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_MiniCPM-V-2_6-int4\nodes_legacy.py", line 253, in inference result = self.model.chat( File "D:\StableDiffusion\ComfyUI-aki-v1.3.cache\huggingface\modules\transformers_modules\MiniCPM-V-2_6-int4\modeling_minicpmv.py", line 380, in chat res = self.generate( File "D:\StableDiffusion\ComfyUI-aki-v1.3.cache\huggingface\modules\transformers_modules\MiniCPM-V-2_6-int4\modeling_minicpmv.py", line 256, in generate ) = self.get_vllm_embedding(model_inputs) File "D:\StableDiffusion\ComfyUI-aki-v1.3.cache\huggingface\modules\transformers_modules\MiniCPM-V-2_6-int4\modeling_minicpmv.py", line 117, in get_vllm_embedding vision_embedding = self.vpm(all_pixel_values, patch_attention_mask=patch_attn_mask, tgt_sizes=tgt_sizes).last_hidden_state File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3.cache\huggingface\modules\transformers_modules\MiniCPM-V-2_6-int4\modeling_navit_siglip.py", line 903, in forward hidden_states = self.embeddings(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask, tgt_sizes=tgt_sizes) File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3\python\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "D:\StableDiffusion\ComfyUI-aki-v1.3.cache\huggingface\modules\transformers_modules\MiniCPM-V-2_6-int4\modeling_navit_siglip.py", line 349, in forward position_ids[batch_idx][p_attn_mask.view(-1).cpu()] = pos_ids