Question about DiffusionModel last layer zero_module

Project-MONAI / GenerativeModels

MONAI Generative Models makes it easy to train, evaluate, and deploy generative models and related applications

Apache License 2.0

613 stars 87 forks source link

Question about DiffusionModel last layer zero_module #419

Closed SophieOstmeier closed 1 year ago

SophieOstmeier commented 1 year ago

I have more like a question.

I am using the DiffusionModel and in the last step, the h tensor is all zeros

The out function is defined

Is there any reason, why zero_module is called? For me, it makes less sense, because the output of the model is always all zeros and computing a gradients is always none. Help is highly appreciated. And sorry in advance, if that is a basic question.

marksgraham commented 1 year ago

Hi,

We found it improved the performance of our models - this is referenced here.

This should not stop your model parameters from updating during training - is this something you have found in your experiments?

SophieOstmeier commented 1 year ago

Thanks for your reply and the reference! That makes sense now. It took a couple of passes until I would not get a zero output, but then the weights updated to some non-zero value.

bill-yc-chen commented 1 week ago

May I ask, why we detach the gradient in the zero_module? https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/nets/diffusion_model_unet.py#L68 Especially the final output layer use the zero_module. https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/nets/diffusion_model_unet.py#L1856

This seems odd to me, like the whole final layer is untrainable and the parameters are always zero.