[Badcase]: 相同的数据，微调时在qwen2.5 72B预训练模型上的loss是qwen2 72B的3倍，请问2.5除了数据变多了，其他有什么不一样吗

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

7.9k stars 479 forks source link

[Badcase]: 相同的数据，微调时在qwen2.5 72B预训练模型上的loss是qwen2 72B的3倍，请问2.5除了数据变多了，其他有什么不一样吗 #935

Open boundles opened 4 hours ago

boundles commented 4 hours ago

Model Series

Qwen2.5

What are the models used?

Qwen2.5-72B预训练模型

What is the scenario where the problem happened?

text to sql

Is this badcase known and can it be solved using avaiable techniques?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find a solution there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

OS: Ubuntu 22.04 Python: Python 3.11 GPUs: 8 x NVIDIA A100

Description

如标题所描述

cipolee commented 3 hours ago

mark，用lora微调7b instruction模型时也相比qwen2的loss高一些

airstillblue commented 2 hours ago

遇到类似问题，同样的环境，loss高一大截。

Songjw133 commented 1 hour ago

boundles commented 50 minutes ago

可以查查是不是eos_token的问题？检查一下添加到输入序列最后的结束符是什么。我微调qwen2.5基础模型，用的自己写的微调代码。昨天一开始试的时候，loss也很高，检查了一下tokenizer的eos_token添加的是<|im_end|>，正常来说应该是<|endoftext|>（这个问题今天已经修复了），把eos_token换成<|endoftext|>，loss就正常了。

是说结束符是<|endoftext|>，而不是<|im_end|>?

Songjw133 commented 31 minutes ago

可以查查是不是eos_token的问题？检查一下添加到输入序列最后的结束符是什么。我微调qwen2.5基础模型，用的自己写的微调代码。昨天一开始试的时候，loss也很高，检查了一下tokenizer的eos_token添加的是<|im_end|>，正常来说应该是<|endoftext|>（这个问题今天已经修复了），把eos_token换成<|endoftext|>，loss就正常了。

是说结束符是<|endoftext|>，而不是<|im_end|>?

boundles commented 17 minutes ago

可以查查是不是eos_token的问题？检查一下添加到输入序列最后的结束符是什么。我微调qwen2.5基础模型，用的自己写的微调代码。昨天一开始试的时候，loss也很高，检查了一下tokenizer的eos_token添加的是<|im_end|>，正常来说应该是<|endoftext|>（这个问题今天已经修复了），把eos_token换成<|endoftext|>，loss就正常了。

是说结束符是<|endoftext|>，而不是<|im_end|>?

对的base模型的结束符是<|endoftext|>，对应token id是151643。你可以看看输入数据的结束符，如果是<|im_end|>（id是151645），那可能就是这个问题。也注意一下是不是用了聊天模板。base模型微调不应该带模板的。聊天模板会自动加上<|im_end|>，也会导致loss很高。我也不知道啥情况，qwen2其实无所谓这些的，但qwen2.5换一个eos_token loss差异就很大。

好的，多谢，我试下看看