使用的命令:
python demos/cli_demo.py -c /home/bill/workspace/CodeShell-7B-Chat-int4 --device cpu
报错信息:
Traceback (most recent call last): File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 225, in <module> main() File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 130, in main model, tokenizer, config = _load_model_tokenizer(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 67, in _load_model_tokenizer model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/modeling_codeshell.py", line 1070, in from_pretrained model = model.to(torch.device(device_map)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2271, in to return super().to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) ^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 982 more times] File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 31, in Params4bitTo device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RecursionError: maximum recursion depth exceeded while calling a Python object
BTW: 使用非量化模型是可以跑起来,命令:
python demos/cli_demo.py -c /home/bill/workspace/CodeShell-7B-Chat --device cpu
使用的命令:
python demos/cli_demo.py -c /home/bill/workspace/CodeShell-7B-Chat-int4 --device cpu
报错信息:Traceback (most recent call last): File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 225, in <module> main() File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 130, in main model, tokenizer, config = _load_model_tokenizer(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/workspace/codeshell/demos/cli_demo.py", line 67, in _load_model_tokenizer model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/modeling_codeshell.py", line 1070, in from_pretrained model = model.to(torch.device(device_map)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2271, in to return super().to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) ^^^^^^^^^ File "/home/bill/micromamba/envs/codeshell/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 40, in Params4bitTo new_param = Params4bit(self.to(device=device, dtype=dtype, non_blocking=non_blocking), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 982 more times] File "/home/bill/.cache/huggingface/modules/transformers_modules/CodeShell-7B-Chat-int4/quantizer.py", line 31, in Params4bitTo device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RecursionError: maximum recursion depth exceeded while calling a Python object
BTW: 使用非量化模型是可以跑起来,命令:python demos/cli_demo.py -c /home/bill/workspace/CodeShell-7B-Chat --device cpu