Traceback (most recent call last):
File "//quant/quant.py", line 27, in
model.quantize(examples)
File "/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 392, in quantize
self.model(example)
File "/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "****/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/home/szaudit/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1021, in forward
transformer_outputs = self.transformer(
File "**/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "**/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "//.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 806, in forward
inputs_embeds = self.wte(input_ids)
File "/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "****/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
52B量化版本什么时候会有呢。我用官方的https://github.com/Tele-AI/Telechat/tree/master/quant 量化会报错,用的是A10显卡
Traceback (most recent call last): File "//quant/quant.py", line 27, in
model.quantize(examples)
File " /lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 392, in quantize
self.model(example)
File "/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "****/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/home/szaudit/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1021, in forward
transformer_outputs = self.transformer(
File "**/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "**/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "//.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 806, in forward
inputs_embeds = self.wte(input_ids)
File "/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "****/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)