8bit和4bit量化版本推理报错

zhangron013 commented 1 month ago

当我调用8bit/4bit版本推理时报错：Calling cuda() is not supported for 4-bit or 8-bit quantized models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. 我在网上查着可能是transformers库或者deepspeed库版本有问题。请问是这个原因吗？如果是的话可以分享下推理成功的环境的这两个库的版本吗？测试环境： transformers 4.32.0 deepspeed 0.9.2 详细报错： Traceback (most recent call last): File "/opt/conda/envs/groma/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/envs/groma/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/zhang/project/Groma-main/groma/eval/run_groma.py", line 138, in eval_model(model_name, args.quant_type, args.image_file, args.query) File "/mnt/zhang/project/Groma-main/groma/eval/run_groma.py", line 58, in eval_model model = GromaModel.from_pretrained(model_name, **kwargs).cuda()
File "/opt/conda/envs/groma/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1998, in cuda raise ValueError( ValueError: Calling cuda() is not supported for 4-bit or 8-bit quantized models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

machuofan commented 1 month ago

Thanks for your feedback. How about deleting .cuda() in line 58?

zhangron013 commented 1 month ago

Thanks for your feedback. How about deleting .cuda() in line 58?

thanks，解决了~

FoundationVision / Groma

8bit和4bit量化版本推理报错 #15