Closed AverageName closed 3 years ago
This is a common empirical practice for image generation networks. According to my own experience, model w/ or w/o sigmoid (or tanh) at the end have little influence to the final result. Removing sigmoid/tanh has two possible advantages: 1) avoid gradient vanish 2) early spot the training problems.
With proper model initialization and optimizer setting, the model can be trained smoothly. To be specific
0.02
. Like cycleganlr=1e-4
If problem still exits, normalization like batch norm and instance norm may help. Note: the practice above only applies to image generation networks.
Can you please explain why don't you use some kind of non linearity at the end, when want to get RGB images? I faced the problem that when training it the way it is implemented sometimes I get pretty big numbers at the end and it leads to some artifacts, but when I am trying to train it with something like tanh at the end it goes crazy and just generate picture of 1 color (for ex. full red or white square)