FederatedAI / FATE-LLM

Federated Learning for LLMs.
Apache License 2.0
144 stars 25 forks source link

ChatGLM-6B模型训练问题 #77

Closed zapjone closed 2 months ago

zapjone commented 3 months ago

各位大佬好,想请假下,在fate中使用LLM训练GPT模型时,报以下错误,根据deepspeed的相关问题和解决,将其fp16禁止掉,但在fate中,将其fp16:{enable:False}后,还是报以下错误,想问下有遇到过这个问题的吗? 环境: 2台3090GPU机器,每台1块GPU。deepspeed==1.13.1 image

mgqa34 commented 3 months ago

这个是deepspeed报的错,说明已经训练到最低点,没法再更新权重了,可以调整下学习率这些