[Bug]: transformers[4.46.2] error in multi-gpu when training.

Model Series

Qwen2.5

What are the models used?

Qwen2.5-7B

What is the scenario where the problem happened?

transformers

Is this a known issue?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find an answer there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

OS: Ubuntu Python: Python3.10 GPUs: 2x NV 4090

Log output

File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/peft/peft_model.py", line 1644, in forward
    return self.base_model(
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
    return self.model.forward(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1183, in forward
    loss = self.loss_function(logits, labels, self.vocab_size, **loss_kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 46, in ForCausalLMLoss
    loss = fixed_cross_entropy(shift_logits, shift_labels, num_items_in_batch, ignore_index, **kwargs)
  File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 28, in fixed_cross_entropy
    loss = loss / num_items_in_batch
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Description

Steps to reproduce

This happens to Qwen2.5-7B-Instruct The problem can be reproduced with the following steps:

just peft training

Expected results

The results are expected to be training

Attempts to fix

Anything else helpful for investigation

downgrade transformers to 4.45.0 will work.

QwenLM / Qwen2.5

[Bug]: transformers[4.46.2] error in multi-gpu when training. #1070