QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
7.04k stars 416 forks source link

基于qwen1.5 14B 100k训练重复 #707

Open 520jefferson opened 2 months ago

520jefferson commented 2 months ago

如题,基于qwen1.5 14B进行continue pretrain 95B,然后sft,发现超过32k就出现重复问题,解码重复生成某些字符串。 具体做法:在进行100k训练时rope base不变还是1m, max position embedding 和 seq len改成100k。

请问这样做有什么问题嘛?初步看是位置编码没学好?配置问题?

jklj077 commented 2 months ago

Extending the supported sequence length of a pretrained model could be a challenging task. There is not much we can share but a handful papers have proposed different ways to achieve this and I would recommend you to take a look at them first.

FYI: in Qwen2, we have shifted (again) the way to support longer sequence (from 32K to 128K).

520jefferson commented 2 months ago

@jklj077 thanks for comment. I will read the paper fist. another question , whether the qwen1.5 and qwen2 has obvious difference? the config.jsonl in qwen1.5 shows " "architectures": [   | "Qwen2ForCausalLM"   | ],   "  

github-actions[bot] commented 1 month ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.