Closed delyan-boychev closed 11 months ago
Hi. Thanks for your interest in our work. As you would know, the pre-trained discriminators are to be used as task-specific feature extractors only. The gradients at the time of training should only be computed for the required model and you may want to set gradients for the parameters of all the pre-trained discriminators as False.
Please try the following in 'mdfloss.py'
for param in self.Ds.parameters():
param.requires_grad = False
Hope this answers your question.
Thanks again Best Aamir
I have already set the require_grads to False and used torch.no_grad() for the target - for instance y, but the main idea is that it has to compute the grads with respect to each of the layers and inputs by applying the chain rule. So for the task, we have 8 backpropagations at a time - 8 different models. It creates 8 deep copies of the input. I am interested in how you measure the inference and backpropagation time and the memory overhead.
Best regards, Delyan
While I was testing the MDF loss, I found that the loss function uses much more memory than what is proposed in the paper for only one image. What might be the issue?
The output of the code above is placed here:
Torch version: 2.0.1+cu117 Python version: 3.10.6 OS: Ubuntu 22.04.2 LTS x86_64 Kernel: 5.15.0-76-generic CPU: AMD Ryzen 3 3100 (8) @ 3.600GHz GPU: NVIDIA GeForce GTX 1050 Ti