In the paper "Improved Training of Wasserstein GANs" (https://papers.nips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf), authors think the model should not use batch normalization in both the generator and the discriminator. But why in the WGAN-GP code, they use batch normalization? BatchNormalization contains non-trainable parameters. How do they update the parameters in the code?
In the paper "Improved Training of Wasserstein GANs" (https://papers.nips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf), authors think the model should not use batch normalization in both the generator and the discriminator. But why in the WGAN-GP code, they use batch normalization? BatchNormalization contains non-trainable parameters. How do they update the parameters in the code?