Open xinyangATK opened 1 year ago
Thanks for your attention!
Dou you mean the zero_module here? https://github.com/ckczzj/PDAE/blob/fbba0355634861196aed8b80b9ba4948ed210ab9/model/module/module.py#L362-L364
It is just a zero-initialization of the output conv layer. The zero-initialization makes the residual block work like an identity function in the beginning of training, which is a commonly-used trick for stable training.
Although the parameters are initilized as zero, their gradient still exist. After the first update of the network, they will be almost none-zero. Recent work ControlNet have similar issues.
Thank you for your patient answer!
This really solved my confusion about this module.
Thank you so much for releasing your code and I have some questions while reproducing your work. In the
forward()
function ofclass ResBlockShift(TimestepZBlock)
, theout_rest(h)
seems set h to zero which doesn't makeemb_z
effrctive. Is there any problems in this module?