Open kings-rgb opened 2 years ago
Hi,
In our case, LN is only slightly better than BN on ImageNet-1K classifications, and this was based on an intermediate model, not the final one. We don't know why and we are not sure how the comparison would be on downstream tasks. This can be interesting future problems.
In ResNets, indeed BN seems better according to some prior work, but in my understanding that LN was also implemented differently from ours (we do per spatial location normalization). This could also be a factor
In normal Resnet, BN is better than LN, we think different dimensions computed is the reason. So, why LN is better than BN in ConvNext?