Open avinashsai opened 4 years ago
@avinashsai Thanks for pointing this out.
The PyTorch (rmsnorm_torch) and TensorFlow (rmsnorm_tensorflow) code do NOT consider the case of CNN. By default, the code can be used for RNN, Feed-Forward and Attention networks, and the normalization is applied to the last dimension.
For the normalization of CNN, I follow the LayerNorm and apply it to the width and height dimensions. Please refer to the CIFAR-10 Classification Section in README for more details.
Biao
Hi, Congratulations for the amazing work. I have some doubts regarding rms normalization.
Which dimensions should be considered for normalization of a CNN?? In the torch code, default axis is -1 which means Width dimension in pytorch CNN. However, in tensorflow it is channels.
Can the normalization be applied on other dimensions as well?? Like in CIFAR-10 experiments. LayerNorm was applied on width and height dimensions.
Thank you.