When gradient_accumulation_steps is set to greater than 1, a RuntimeError occurs: "Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed)."

bichunyang419 commented 6 months ago

File "train_stage_1.py", line 730, in main(config) File "train_stage_1.py", line 601, in main Traceback (most recent call last): File "train_stage_1.py", line 730, in accelerator.backward(loss) File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/accelerate/accelerator.py", line 1851, in backward main(config) File "train_stage_1.py", line 601, in main self.scaler.scale(loss).backward(kwargs) File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward accelerator.backward(loss) File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/accelerate/accelerator.py", line 1851, in backward torch.autograd.backward( File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward self.scaler.scale(loss).backward(kwargs) File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. torch.autograd.backward( File "/home/bichunyang3/venvs/Moore/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

asFeng commented 3 months ago

move "reference_control_reader.clear()" "reference_control_writer.clear()" to before the net forward pass

WJohnnyW commented 3 months ago

move "reference_control_reader.clear()" "reference_control_writer.clear()" to before the net forward pass

can you show the code in detial? I have tried it, but failed.

jixinya commented 3 weeks ago

I think it means move the code "reference_control_reader.clear()" "reference_control_writer.clear()" in Line 611,612 to somewhere before "model_pred = net(xxx)" in Line 562, which works for me.

MooreThreads / Moore-AnimateAnyone

When gradient_accumulation_steps is set to greater than 1, a RuntimeError occurs: "Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed)." #122