PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.31k stars 5.63k forks source link

多卡训练ernie-health报错 #55071

Closed autumnCui closed 4 months ago

autumnCui commented 1 year ago

请提出你的问题 Please ask your question

单卡训练时可以,但是多卡训练就报这个错: RuntimeError: (PreconditionNotMet) A serious error has occurred here. Please set find_unused_parameters=True to traverse backward graph in each step to prepare reduce in advance. If you have set, There may be several reasons for this error: 1) Please note that all forward outputs derived from the module parameters must participate in the calculation of losses and subsequent gradient calculations. If not, the wrapper will hang, waiting for autograd to generate gradients for these parameters. you can use detach or stop_gradient to make the unused parameters detached from the autograd graph. 2) Used multiple forwards and one backward. You may be able to wrap multiple forwards in a model. [Hint: Expected groups_needfinalize == false, but received groups_needfinalize:1 != false:0.] (at /paddle/paddle/fluid/distributed/collective/reducer.cc:609) 请问怎么解决?

gongel commented 1 year ago

请问是使用这个代码吗?https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-health

paddle-bot[bot] commented 4 months ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。