Closed triumph closed 3 months ago
pip install bitsandbytes 安装之后
python /services/srv/MiniCPM-V/web_demo_2.5.py --device cuda
The load_in_4bit
and load_in_8bit
arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig
object in quantization_config
argument instead.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/quantizers/auto.py:159: UserWarning: You passed quantization_config
or equivalent parameters to from_pretrained
but the model you're loading already has a quantization_config
attribute. The quantization_config
from the model will be used.
warnings.warn(warning_msg)
low_cpu_mem_usage
was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.39it/s]
Traceback (most recent call last):
File "/services/srv/MiniCPM-V/web_demo_2.5.py", line 28, in .to
is not supported for 4-bit
or 8-bit
bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype
.
感谢反馈,web_demo_2.5.py 已增加int4兼容here,请重试.
Traceback (most recent call last):
File "/services/srv/MiniCPM-V/web_demo_2.5-int4.py", line 143, in chat
answer = model.chat(
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/modeling_minicpmv.py", line 416, in chat
res, vision_hidden_states = self.generate(
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/modeling_minicpmv.py", line 328, in generate
result = self._decode(model_inputs["inputs_embeds"], tokenizer, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/modeling_minicpmv.py", line 213, in _decode
output = self.llm.generate(
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, **kwargs)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1611, in generate
logits_warper = self._get_logits_warper(generation_config)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 736, in _get_logits_warper
warpers.append(TemperatureLogitsWarper(generation_config.temperature))
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 284, in init
raise ValueError(except_msg)
ValueError: temperature
(=0) has to be a strictly positive float, otherwise your next token scores will be invalid.
如报错信息所示,temperature不能为0,需调整为正数
ValueError: temperature (=0) has to be a strictly positive float, otherwise your next token scores will be invalid.
请问是否可以让int4的模型在CPU运行? 我3060,6G GPU内存也跑不起来int4的版本
请问是否可以让int4的模型在CPU运行? 我3060,6G GPU内存也跑不起来int4的版本
CPU运行可以使用llama.cpp, 我们将很快发布,敬请期待。
请问是否可以让int4的模型在CPU运行? 我3060,6G GPU内存也跑不起来int4的版本
CPU运行可以使用llama.cpp, 我们将很快发布,敬请期待。
好的 期待
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of llama.cpp for more detail.
and here is our model in gguf format. https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf @lonlonago
@Cuiunbo thanks very much.
python /services/srv/MiniCPM-V/web_demo_2.5.py --device cuda 【已经修改web_demo_2.5.py的模型为MiniCPM-Llama3-V-2_5-int4】
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. Traceback (most recent call last): File "/services/srv/MiniCPM-V/web_demo_2.5-int4.py", line 28, in
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.float16)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3155, in from_pretrained
config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 149, in merge_quantization_configs
quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 79, in from_dict
return target_cls.from_dict(quantization_config_dict)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 94, in from_dict
config = cls(**config_dict)
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 284, in init
self.post_init()
File "/services/srv/MiniCPM-V/venv/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 342, in post_init
if self.load_in_4bit and not version.parse(importlib.metadata.version("bitsandbytes")) >= version.parse(
File "/usr/lib/python3.10/importlib/metadata/init.py", line 996, in version
return distribution(distribution_name).version
File "/usr/lib/python3.10/importlib/metadata/init.py", line 969, in distribution
return Distribution.from_name(distribution_name)
File "/usr/lib/python3.10/importlib/metadata/init.py", line 548, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes