QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
7.28k stars 440 forks source link

sft 7B model_max_length=90000 24 A00 OOM #794

Closed gk-cv closed 1 day ago

gk-cv commented 1 month ago

参考这里https://qwen.readthedocs.io/zh-cn/latest/training/SFT/example.html脚本

使用 24张A100,对7B sft,model_max_length超过2w时,OOM

gk-cv commented 1 month ago

请问配置问题吗

jklj077 commented 1 month ago

I don't think it's a configuration issue per se. It's just that script is not expected to be used for finetuning in that scale. You'd be better use dedicated finetuning frameworks as mentioned in the README. You may want to also try pretraining framework like mcore for that kind of finetuning.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.