Closed SophieOstmeier closed 1 year ago
Hi,
We found it improved the performance of our models - this is referenced here.
This should not stop your model parameters from updating during training - is this something you have found in your experiments?
Thanks for your reply and the reference! That makes sense now. It took a couple of passes until I would not get a zero output, but then the weights updated to some non-zero value.
May I ask, why we detach the gradient in the zero_module? https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/nets/diffusion_model_unet.py#L68 Especially the final output layer use the zero_module. https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/nets/diffusion_model_unet.py#L1856
This seems odd to me, like the whole final layer is untrainable and the parameters are always zero.
I have more like a question.
I am using the DiffusionModel and in the last step, the h tensor is all zeros
The out function is defined
Is there any reason, why zero_module is called? For me, it makes less sense, because the output of the model is always all zeros and computing a gradients is always none. Help is highly appreciated. And sorry in advance, if that is a basic question.