Weight scaling is applied to bias as well

facebookresearch / pytorch_GAN_zoo

A mix of GAN implementations including progressive growing

BSD 3-Clause "New" or "Revised" License

1.62k stars 271 forks source link

Weight scaling is applied to bias as well #111

Open HarikrishnanBalagopal opened 4 years ago

HarikrishnanBalagopal commented 4 years ago

https://github.com/facebookresearch/pytorch_GAN_zoo/blob/7275ecbf53a9db7e4bc38c4c5136c10c4950724b/models/networks/custom_layers.py#L72-L74

The above implementation applies the weight scaling to the bias tensor as well. However in the original implementation (https://github.com/tkarras/progressive_growing_of_gans/blob/master/networks.py#L53-L59) weight scaling is NOT applied to bias tensor.

This makes sense since He normal initialization takes into account fan-in and fan-out which depends on the dimensionality of the weights, not the biases. https://medium.com/@prateekvishnu/xavier-and-he-normal-he-et-al-initialization-8e3d7a087528

HarikrishnanBalagopal commented 4 years ago

I think the solution is to set bias=False on these lines: https://github.com/facebookresearch/pytorch_GAN_zoo/blob/master/models/networks/custom_layers.py#L98-L100 and https://github.com/facebookresearch/pytorch_GAN_zoo/blob/master/models/networks/custom_layers.py#L120-L121

Then add a bias separately after the x = self.module(x) and x *= self.weight lines https://github.com/facebookresearch/pytorch_GAN_zoo/blob/master/models/networks/custom_layers.py#L72-L74

Molugan commented 4 years ago

Hello, Sorry for the delay. You're right I indeed missed that part. I'll see I have some tine to retrain the models with this modification.

altairmn commented 3 years ago

I don't think there's a bug. The objective of weight scaling is to control the gradients. In this implementation, when activations are scaled instead of the weights, the impact on weight gradients is the same because weights are multiplied with activations. However, since biases are added, the multiplier for activations does not affect the gradient. That is my understanding of it.

NullCodex commented 3 years ago

If you let bias = False then the module no longer contains bias. @HarikrishnanBalagopal. @altairmn can you explain your comment? In my mind there's definitely a difference.