Open 520jefferson opened 2 months ago
Extending the supported sequence length of a pretrained model could be a challenging task. There is not much we can share but a handful papers have proposed different ways to achieve this and I would recommend you to take a look at them first.
FYI: in Qwen2, we have shifted (again) the way to support longer sequence (from 32K to 128K).
@jklj077 thanks for comment. I will read the paper fist. another question , whether the qwen1.5 and qwen2 has obvious difference? the config.jsonl in qwen1.5 shows " "architectures": [ | "Qwen2ForCausalLM" | ], "
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
如题,基于qwen1.5 14B进行continue pretrain 95B,然后sft,发现超过32k就出现重复问题,解码重复生成某些字符串。 具体做法:在进行100k训练时rope base不变还是1m, max position embedding 和 seq len改成100k。
请问这样做有什么问题嘛?初步看是位置编码没学好?配置问题?