Why do we need to clip gradient in netC?

YuanXue1993 / SegAN

SegAN: Semantic Segmentation with Adversarial Learning

MIT License

182 stars 58 forks source link

Why do we need to clip gradient in netC? #3

Closed John1231983 closed 5 years ago

John1231983 commented 6 years ago

Thanks for sharing your code. In your code, it has

#clip parameters in D
for p in NetC.parameters():
     p.data.clamp_(-0.05, 0.05)

why do we need to clip parameters in D? How to decide the value -0.05, 0.05? Thanks

YuanXue1993 commented 6 years ago

This prevents the output of NetC from being arbitrarily large during the training of NetC, and it's actually a common trick to avoid gradient exploding. You can find the theoretical proof of the necessity in the original paper, if you don't clip the parameters in NetC, it will not guarantee the convergence of the training. As for the value of the range, you can definitely try other possible hyper-parameters.

John1231983 commented 6 years ago

Thanks so much. So we only need to clip gradient in the netC, is it right, why not apply clipping in netS? I am using SGD in netC and the loss becomes zeros when I apply clipping.

YuanXue1993 commented 5 years ago

Cippling paras in C is to enforce the Lipschitz constraint (in a less elegant and effective way, you can check out newer and better methods such as gradient penalty and spectral normalization if interested). Clipping paras in S doesn't make much sense other than restricting the capability of S.