Open tanypullin opened 1 week ago
downgrade transformers to 4.45.0 will work.
looks like an issue with tranformers
after the loss functions are reworked in 4.46.
for a hot fix, could you try edit this line
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 28, in fixed_cross_entropy
loss = loss / num_items_in_batch
to
loss = loss / torch.tensor(num_items_in_batch, device=loss.device)
or stay at transformers<4.46.0
until a proper fix is released.
Model Series
Qwen2.5
What are the models used?
Qwen2.5-7B
What is the scenario where the problem happened?
transformers
Is this a known issue?
Information about environment
OS: Ubuntu Python: Python3.10 GPUs: 2x NV 4090
Log output
Description
Steps to reproduce
This happens to Qwen2.5-7B-Instruct The problem can be reproduced with the following steps:
Expected results
The results are expected to be training
Attempts to fix
Anything else helpful for investigation
downgrade transformers to 4.45.0 will work.