ys-zong commented 2 years ago

Hi, thanks for the nice work!

When I'm using a higher version of Pytorch (>1.9.0), an error occurs during training

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-893328275b2f> in <module>
     91         loss3 = -CD(z_rand).mean() - c_loss  + gradient_penalty_cd
     92 
---> 93         loss3.backward(retain_graph=True)
     94         cd_optimizer.step()
     95 

~/.conda/envs/torch11/lib/python3.8/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    305                 create_graph=create_graph,
    306                 inputs=inputs)
--> 307         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    308 
    309     def register_hook(self, hook):

~/.conda/envs/torch11/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    152         retain_graph = create_graph
    153 
--> 154     Variable._execution_engine.run_backward(
    155         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    156         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1000, 512, 4, 4, 4]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Currently, I think the problem is from the c_loss, which is the second term of the loss3. If I comment the c_loss the error will no longer exist. I'm wondering should I modify anything about the c_loss to fix it?

cyclomon commented 2 years ago

Hi, since our source code is based on pytorch 0.4.1 , maybe several points (e.g. using 'Variable') are quite different from higher version (>1.0).

We are planning to revise the source code for higher version. Sorry for inconvenience.

eriktaylor commented 2 years ago

hello, i was able to correct this issue by the following changes to c_loss as suggested above:

c_loss = -CD(z_hat).mean() #this failed

c_loss = -CD(z_hat.detach()).mean() #this worked fine

This will detach z_hat from the current graph. From detach method: When we don't need a tensor to be traced for the gradient computation, we detach the tensor from the current computational graph.

Do you know if this will cause an issue with Alpha_WGAN performance?

cyclomon commented 2 years ago

Thanks for your comment. I recognize that detaching the reconstructed latent is correct implementation. As training 3D brain MRI GAN is much more sensitive to the natural image cases, those corrections might cause difference in performance of effect training stability.

Again, I apologize that I don't have enough time to update the source code for now. Later I will spend time to revise the code.

cyclomon / 3dbraingen

Error when using higher version pytorch #13

c_loss = -CD(z_hat).mean() #this failed