Closed luguoyixiazi closed 1 month ago
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [x] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
使用未量化版本可以正常推理,使用int4 Python闪退,没有报错,提示Segmentation fault
复现方法 | Steps To Reproduce
model = AutoModel.from_pretrained('/path/MiniCPM-Llama3-V-2_5-int4', trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained('/path/MiniCPM-Llama3-V-2_5-int4', trust_remote_code=True) model.eval() image = Image.open('img.jpg').convert('RGB') question = '从图片中提取出有关的身份信息,以json的格式返回' msgs = [{'role': 'user', 'content': question}] default_params = {"num_beams":3, "repetition_penalty": 1.2, "max_new_tokens": 1024,'temperature': 0.1,} try: res = model.chat( image=image, msgs=msgs, tokenizer=tokenizer, **default_params ) print(res) except Exception as e: print(e)
运行环境 | Environment
- OS:wsl2 - Python:3.10 - Transformers:4.40.0 - PyTorch:2.3.0+cu118 - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):cu118
备注 | Anything else?
是不是因为transformers的这个提示?Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage
was None, now set to True since model is quantized.
可以在AutoModel.from_pretrained里面加入low_cpu_mem_usage=True再试下。
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [x] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
使用未量化版本可以正常推理,使用int4 Python闪退,没有报错,提示Segmentation fault
复现方法 | Steps To Reproduce
model = AutoModel.from_pretrained('/path/MiniCPM-Llama3-V-2_5-int4', trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained('/path/MiniCPM-Llama3-V-2_5-int4', trust_remote_code=True) model.eval() image = Image.open('img.jpg').convert('RGB') question = '从图片中提取出有关的身份信息,以json的格式返回' msgs = [{'role': 'user', 'content': question}] default_params = {"num_beams":3, "repetition_penalty": 1.2, "max_new_tokens": 1024,'temperature': 0.1,} try: res = model.chat( image=image, msgs=msgs, tokenizer=tokenizer, **default_params ) print(res) except Exception as e: print(e)
运行环境 | Environment
- OS:wsl2 - Python:3.10 - Transformers:4.40.0 - PyTorch:2.3.0+cu118 - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):cu118
备注 | Anything else?
是不是因为transformers的这个提示?Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage
was None, now set to True since model is quantized.可以在AutoModel.from_pretrained里面加入low_cpu_mem_usage=True再试下。
定位出来了,get_vllm_embedding的vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state产生的问题,emmm
emmm,大体上我把nvdia的驱动降级到537.58然后torch降级到2.1.2及其配套之后解决了,根据我查阅的资料,报这个错是nv的驱动和torch都有问题……
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
使用未量化版本可以正常推理,使用int4 Python闪退,没有报错,提示Segmentation fault
复现方法 | Steps To Reproduce
运行环境 | Environment
备注 | Anything else?
是不是因为transformers的这个提示?Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage
was None, now set to True since model is quantized.