基于qwen1.5 14B 100k训练重复

QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

7.04k stars 416 forks source link

基于qwen1.5 14B 100k训练重复 #707

Open 520jefferson opened 2 months ago

520jefferson commented 2 months ago

如题，基于qwen1.5 14B进行continue pretrain 95B，然后sft，发现超过32k就出现重复问题，解码重复生成某些字符串。具体做法：在进行100k训练时rope base不变还是1m， max position embedding 和 seq len改成100k。

请问这样做有什么问题嘛？初步看是位置编码没学好？配置问题？

jklj077 commented 2 months ago

Extending the supported sequence length of a pretrained model could be a challenging task. There is not much we can share but a handful papers have proposed different ways to achieve this and I would recommend you to take a look at them first.

FYI: in Qwen2, we have shifted (again) the way to support longer sequence (from 32K to 128K).

520jefferson commented 2 months ago

@jklj077 thanks for comment. I will read the paper fist. another question , whether the qwen1.5 and qwen2 has obvious difference? the config.jsonl in qwen1.5 shows " "architectures": [ | "Qwen2ForCausalLM" | ], "

github-actions[bot] commented 1 month ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.