是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
[X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
Passing the image tag -> to prompt actually breaks the inference.
I have finetuned the model using the . Now I cannot use it in inference.
I tried using the original model as well to see if is working there, it also gives the same error.
Usage: Prompt - "Image: ..rest of the prompt"
Error:
Cell In[2], [line 59](vscode-notebook-cell:?execution_count=2&line=59)
[56](vscode-notebook-cell:?execution_count=2&line=56) msgs = [{'role': 'user', 'content': prompt}]
[58](vscode-notebook-cell:?execution_count=2&line=58) if is_image_passed and image is not None:
---> [59](vscode-notebook-cell:?execution_count=2&line=59) res = model_req['model'].chat(
[60](vscode-notebook-cell:?execution_count=2&line=60) image=image,
[61](vscode-notebook-cell:?execution_count=2&line=61) msgs=msgs,
[62](vscode-notebook-cell:?execution_count=2&line=62) tokenizer=model_req['tokenizer'],
[63](vscode-notebook-cell:?execution_count=2&line=63) sampling=True,
[64](vscode-notebook-cell:?execution_count=2&line=64) temperature=0.1,
[65](vscode-notebook-cell:?execution_count=2&line=65) stream=False
[66](vscode-notebook-cell:?execution_count=2&line=66) )
[67](vscode-notebook-cell:?execution_count=2&line=67) else:
[68](vscode-notebook-cell:?execution_count=2&line=68) res = model_req['model'].chat(
[69](vscode-notebook-cell:?execution_count=2&line=69) msgs=msgs,
[70](vscode-notebook-cell:?execution_count=2&line=70) tokenizer=model_req['tokenizer'],
(...)
[73](vscode-notebook-cell:?execution_count=2&line=73) stream=False
[74](vscode-notebook-cell:?execution_count=2&line=74) )
File /mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:454, in MiniCPMV.chat(self, image, msgs, tokenizer, vision_hidden_states, max_new_tokens, sampling, max_inp_length, system_prompt, stream, **kwargs)
[449](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:449) generation_config.update(
[450](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:450) (k, kwargs[k]) for k in generation_config.keys() & kwargs.keys()
[451](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:451) )
[453](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:453) with torch.inference_mode():
--> [454](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:454) res, vision_hidden_states = self.generate(
[455](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:455) input_id_list=[input_ids],
[456](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:456) max_inp_length=max_inp_length,
[457](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:457) img_list=[images],
[458](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:458) tgt_sizes=[tgt_sizes],
[459](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:459) tokenizer=tokenizer,
[460](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:460) max_new_tokens=max_new_tokens,
[461](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:461) vision_hidden_states=vision_hidden_states,
[462](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:462) return_vision_hidden_states=True,
[463](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:463) stream=stream,
[464](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:464) **generation_config
[465](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:465) )
[467](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:467) if stream:
[468](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:468) def stream_gen():
File /mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:333, in MiniCPMV.generate(self, input_id_list, img_list, tgt_sizes, tokenizer, max_inp_length, vision_hidden_states, return_vision_hidden_states, stream, **kwargs)
[330](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:330) img_list = [[] for i in range(bs)]
[331](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:331) assert bs == len(img_list)
--> [333](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:333) model_inputs = self._process_list(tokenizer, input_id_list, max_inp_length)
[335](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:335) if vision_hidden_states is None:
[336](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:336) pixel_values = []
File /mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:208, in MiniCPMV._process_list(self, tokenizer, input_id_list, max_inp_length)
[205](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:205) input_tensors = []
[206](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:206) for input_ids in input_id_list:
[207](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:207) input_tensors.append(
--> [208](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:208) self._convert_to_tensors(tokenizer, input_ids, max_inp_length)
[209](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:209) )
[210](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:210) padded = {}
[211](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:211) for key in pad_keys:
File /mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:188, in MiniCPMV._convert_to_tensors(self, tokenizer, input_ids, max_inp_length)
[186](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:186) image_end_tokens = torch.where(input_ids == tokenizer.im_end_id)[0]
[187](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:187) valid_image_nums = max(len(image_start_tokens), len(image_end_tokens))
--> [188](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:188) image_bound = torch.hstack(
[189](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:189) [
[190](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:190) image_start_tokens[:valid_image_nums].unsqueeze(-1),
[191](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:191) image_end_tokens[:valid_image_nums].unsqueeze(-1),
[192](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:192) ]
[193](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:193) )
[195](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:195) model_input = {}
[196](https://vscode-remote+ssh-002dremote-002bnode4-002elocal.vscode-resource.vscode-cdn.net/mnt/LLaMA/HF_models/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/8e3d16252bb1d1030a2caada62ac04ec834c15f1/modeling_minicpmv.py:196) model_input["input_ids"] = input_ids.unsqueeze(0).to(self.device)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 7 for tensor number 1 in the list.
期望行为 | Expected Behavior
Image be placed where the tag is and the inference to work.
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
Passing the image tag -> to prompt actually breaks the inference.
I have finetuned the model using the . Now I cannot use it in inference.
I tried using the original model as well to see if is working there, it also gives the same error.
Usage: Prompt - "Image: ..rest of the prompt"
Error:
期望行为 | Expected Behavior
Image be placed where the tag is and the inference to work.
复现方法 | Steps To Reproduce
Add tag to the prompt and run inference
运行环境 | Environment
备注 | Anything else?
No response