Why no activation (nonlinearity) between two conv layers?

msrepo commented 6 months ago

Problem: A basic MedNeXT Block does not have activation between first conv block and the expansion conv block. Doesn't this mean - these two stacked conv+norm layers does not add any expressive power compared to a single conv+norm layer"?

FrancescoLR commented 6 months ago

Good point! I suppose it's a specific choice, but I would really like a clarification by the authors as well.

saikat-roy commented 6 months ago

The standard mednext blocks are 3D counterparts of the 2D convnext (https://arxiv.org/pdf/2201.03545) architecture.

The first conv layer is a depthwise convolution with larger kernels and no interactions between channels. The second layer is not depthwise with a smaller kernel and allows for learning across channels. The design choice decouples kernel size from the size of the expansion layer - one can increase the expansion ratio without necessarily needing to do it for convolutions of size k. This would be a problem if the value of C is particularly high for that block.

msrepo commented 6 months ago

Thank you for the clarification. Turns out, the convnext paper does have a paragraph on why fewer activations and even normalization layers. Seem to boil down to (small) improved performance.

MIC-DKFZ / MedNeXt

Why no activation (nonlinearity) between two conv layers? #21