Closed wciq1208 closed 4 months ago
遇到了同样的问题,如果prompt长度一样可以正常推理,长度不一致时padding之后推理会报错 RuntimeError: The expanded size of the tensor (1626) must match the existing size (27) at non-singleton dimension 3. Target sizes: [2, 32, 1626, 1626]. Tensor sizes: [2, 1, 27, 27]
已修
System Info / 系統信息
容器:pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime GPU:一张3090 PYTHON:3.10 transformers:4.41.2
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
代码:
输出和报错:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The
test()
File "/hestia/src/adapter/chatglm.py", line 224, in test
generated_ids = model.generate(generate_kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1017, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 906, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 664, in forward
layer_ret = layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 567, in forward
attention_output, kv_cache = self.self_attention(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 464, in forward
context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 250, in forward
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
RuntimeError: The expanded size of the tensor (1613) must match the existing size (14) at non-singleton dimension 3. Target sizes: [2, 32, 1613, 1613]. Tensor sizes: [2, 1, 14, 14]
load_in_4bit
andload_in_8bit
arguments are deprecated and will be removed in the future versions. Please, pass aBitsAndBytesConfig
object inquantization_config
argument instead. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:17<00:00, 1.19s/it] [[{'role': 'user', 'content': '图片里有什么文字信息吗', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2543x1308 at 0x7FBF1F948CA0>}], [{'role': 'user', 'content': '给图片做个总结', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2543x1308 at 0x7FBF1698EA10>}]] Traceback (most recent call last): File "/hestia/src/adapter/chatglm.py", line 233, inExpected behavior / 期待表现
能正确work