facebookresearch / ConvNeXt

Code release for ConvNeXt model
MIT License
5.78k stars 696 forks source link

Why LayerNorm before conv in downsampling layers ? #132

Open F-Barto opened 2 years ago

F-Barto commented 2 years ago

Thanks for your awesome work!

While stem is coherent in regard to Blocks where we have the ordering conv->norm, in dowsampling layers you put LayerNorm before convolution.

The full path is:

Which means that if residual stage 1 converges to identity, we have a layernorm into a layernorm which seems weird to me:

Can you explain this design choice ?