Non Linearity at the end

chaofengc / PSFRGAN

PyTorch codes for "Progressive Semantic-Aware Style Transformation for Blind Face Restoration", CVPR2021

Other

370 stars 68 forks source link

This is a common empirical practice for image generation networks. According to my own experience, model w/ or w/o sigmoid (or tanh) at the end have little influence to the final result. Removing sigmoid/tanh has two possible advantages: 1) avoid gradient vanish 2) early spot the training problems.

With proper model initialization and optimizer setting, the model can be trained smoothly. To be specific

Use a small init_gain, e.g., 0.02. Like cyclegan
Use adam optimizer with lr=1e-4

If problem still exits, normalization like batch norm and instance norm may help. Note: the practice above only applies to image generation networks.

chaofengc / PSFRGAN

Non Linearity at the end #16