Traceback (most recent call last):
File "/root/DRIVE/main.py", line 147, in <module>
loss.backward()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/root/miniconda3/lib/python3.8/site-packages/spikingjelly/activation_based/surrogate.py", line 1639, in backward
return leaky_k_relu_backward(grad_output, ctx.saved_tensors[0], ctx.leak, ctx.k)
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
我按照提示改为loss.backward(retain_graph=True),又得到了新的错误:
Traceback (most recent call last):
File "/root/DRIVE/main.py", line 147, in <module>
loss.backward(retain_graph=True)
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py:173: UserWarning: Error detected in CudnnBatchNormBackward0. Traceback of forward call that caused the error:
File "/root/DRIVE/main.py", line 143, in <module>
outputs = s_model(inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DRIVE/spiking_unet.py", line 115, in forward
x = self.up4(x, x1)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DRIVE/spiking_unet.py", line 63, in forward
x = self.conv(x)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DRIVE/spiking_unet.py", line 23, in forward
t = self.c(x)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/spikingjelly/activation_based/layer.py", line 465, in forward
return functional.seq_to_ann_forward(x, super().forward)
File "/root/miniconda3/lib/python3.8/site-packages/spikingjelly/activation_based/functional.py", line 686, in seq_to_ann_forward
y = stateless_module(y)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
return F.batch_norm(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2421, in batch_norm
return torch.batch_norm(
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/root/DRIVE/main.py", line 147, in <module>
loss.backward(retain_graph=True)
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Issue type
SpikingJelly version
latest
Description
我使用spikingjelly将unet做成了snn版本,我的完整的代码在这里。其中spiking_unet.py是出现问题的模型,而test.py是结构与spiking_unet相同的ann-unet。在main.py中有训练的pipeline。
首先,运行spiking_unet时,我遇到了如下报错,告诉我反向传播时出错
我按照提示改为loss.backward(retain_graph=True),又得到了新的错误:
我再次按照提示加上with torch.autograd.set_detect_anomaly(True),又产生了新的错误:
非常奇怪,我写的代码里并没有inplace操作。于是我将spiking_unet中改为同样结构的unet,运行后所有问题消失了。 我推测在Spikingjelly中某一步出错,导致了inplace操作或反向传播出错。 同时我发现,之前有类似的issue #419 ,不仅仅有我出现了这个问题。