多卡训练ernie-health报错

autumnCui commented 1 year ago

请提出你的问题 Please ask your question

单卡训练时可以，但是多卡训练就报这个错： RuntimeError: (PreconditionNotMet) A serious error has occurred here. Please set find_unused_parameters=True to traverse backward graph in each step to prepare reduce in advance. If you have set, There may be several reasons for this error: 1) Please note that all forward outputs derived from the module parameters must participate in the calculation of losses and subsequent gradient calculations. If not, the wrapper will hang, waiting for autograd to generate gradients for these parameters. you can use detach or stop_gradient to make the unused parameters detached from the autograd graph. 2) Used multiple forwards and one backward. You may be able to wrap multiple forwards in a model. [Hint: Expected groups_needfinalize == false, but received groups_needfinalize:1 != false:0.] (at /paddle/paddle/fluid/distributed/collective/reducer.cc:609) 请问怎么解决？

gongel commented 1 year ago

请问是使用这个代码吗？https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-health

paddle-bot[bot] commented 4 months ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复，我们将关闭这个issue/pr。若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

PaddlePaddle / Paddle

多卡训练ernie-health报错 #55071

请提出你的问题 Please ask your question