Open zplizzi opened 5 years ago
It just reduces the learning rate for those blocks by a factor of 10 (due to the adaptive optimizer RMSProp). We haven't played around with it too much and I think it might also work fine without the 0.1.
I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.
I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.
Thanks for reporting your experimental results. What architecture + dataset did you use? I quickly tried on celebA + LSUN churches at resolution 128^2 and there it appears to work fine without the 0.1 and a lr of 1e-5. One possible reason why it did not work for you could be that the 0.1 also changes the initialization, which can be quite important (for deep learning in general and our method in particular), as it only has local guarantees. What you can try is to add a
nn.init.zeros_(self.conv_1.weight)
nn.init.zeros_(self.conv_1.bias)
to the __init__
function of the ResNet blocks when removing the 0.1 and set both learning rates to 1e-5.
Thanks for your reply! I used celebA-HQ and the image size is 1024*1024. I just changed the lr in configs/celebA-HQ and removed the factor 0.1 in gan_training/models/resnet.py. I will try the initialization change. Thanks!
The 0.1 factor made more sense to me after reading the Fixup paper - it explains why standard initialization methods are poorly suited for ResNets and can cause immediate gradient explosion. The 0.1 factor is a rough approximation of the fix they suggest, which is down-scaling the initializations in the resnet blocks, and then potentially initializing the last conv layer of each block to 0 (as @LMescheder mentions above), along with a few other changes.
In the paper and code (eg here), the output of the resnet blocks is multiplied by 0.1. I'm curious of the purpose of this. Does it have to do with the absence of batch-norm?