bzhangGo / rmsnorm

Root Mean Square Layer Normalization
BSD 3-Clause "New" or "Revised" License
212 stars 12 forks source link

Normalization of CNN in CIFAR-10 experiments #1

Open avinashsai opened 4 years ago

avinashsai commented 4 years ago

Hi, Congratulations for the amazing work. I have some doubts regarding rms normalization.

  1. Which dimensions should be considered for normalization of a CNN?? In the torch code, default axis is -1 which means Width dimension in pytorch CNN. However, in tensorflow it is channels.

  2. Can the normalization be applied on other dimensions as well?? Like in CIFAR-10 experiments. LayerNorm was applied on width and height dimensions.

Thank you.

bzhangGo commented 4 years ago

@avinashsai Thanks for pointing this out.

  1. The PyTorch (rmsnorm_torch) and TensorFlow (rmsnorm_tensorflow) code do NOT consider the case of CNN. By default, the code can be used for RNN, Feed-Forward and Attention networks, and the normalization is applied to the last dimension.

  2. For the normalization of CNN, I follow the LayerNorm and apply it to the width and height dimensions. Please refer to the CIFAR-10 Classification Section in README for more details.

Biao