When I train the model using python main.py , it works well.
But when I try to refine the model, using python main.py --refine --lr 1e-5 --reload --previous_dir
it reports error:
Traceback (most recent call last):
File "main.py", line 225, in
loss = train(opt, actions, train_dataloader, model, optimizer_all, epoch)
File "main.py", line 23, in train
return step('train', opt, actions, train_loader, model, optimizer, epoch)
File "main.py", line 94, in step
loss.backward()
File "D:\Anaconda\envs\pose\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\pose\lib\site-packages\torch\autograd__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I can't fine the problem, I almostly use with torch.autograd.set_detect_anomaly(True): , but still can't find the crux.
When I train the model using python main.py , it works well. But when I try to refine the model, using python main.py --refine --lr 1e-5 --reload --previous_dir it reports error:
Traceback (most recent call last): File "main.py", line 225, in
loss = train(opt, actions, train_dataloader, model, optimizer_all, epoch)
File "main.py", line 23, in train
return step('train', opt, actions, train_loader, model, optimizer, epoch)
File "main.py", line 94, in step
loss.backward()
File "D:\Anaconda\envs\pose\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\pose\lib\site-packages\torch\autograd__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I can't fine the problem, I almostly use with torch.autograd.set_detect_anomaly(True): , but still can't find the crux.