Open rfvscj opened 1 month ago
仅针对于这个错误,发现是transformers版本问题,4.42.x报错,4.39.x没问题。
看到这里说明是版本问题 torch.Size([1, 1, 592, 128]) torch.Size([1, 48, 592, 128])
Traceback (most recent call last):
File "/root/xtuner/xtuner/tools/train.py", line 360, in <module>
main()
File "/root/xtuner/xtuner/tools/train.py", line 356, in main
runner.train()
File "/usr/local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/usr/local/lib/python3.10/site-packages/mmengine/runner/loops.py", line 271, in run
self.runner.call_hook('before_train')
File "/usr/local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/root/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 234, in before_train
self._generate_samples(runner, max_new_tokens=50)
File "/root/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 220, in _generate_samples
self._eval_images(runner, model, device, max_new_tokens,
File "/root/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 152, in _eval_images
generation_output = model.generate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/internlm2-chat-20b/modeling_internlm2.py", line 1226, in forward
logits = self.output(hidden_states)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
internlm2 版本你是最新的吗?现在有不兼容问题,因为 internlm2 之前更新过一次。我建议你 llm 拉取最新的,xtuner 也换成最新的,应该就可以了
Does there any for transformer >= 4.42.4?
复现方式
xtuner train llava_internlm2_chat_20b_clip_vit_large_p14_336_e1_gpu8_pretrain.py
配置文件
仅改动数据集及模型位置
运行日志