关于GLM finetune的OOM

HarderThenHarder / transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

https://www.zhihu.com/column/c_1451236880973426688

2.17k stars 381 forks source link

关于GLM finetune的OOM #54

Open nuoma opened 1 year ago

nuoma commented 1 year ago

使用train_multi_gpu, 两张3090显存报OOM。一开始是加载就OOM，把命令行中的FP16去掉后能够训练，但是不久就OOM，显存占用几乎是顶格23.4G/24G。然后我把加载模型的时候去掉了.half()加上了load_in_8bit=True，报错：ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices. 看了是accelerator不支持的问题。