Closed msrepo closed 6 months ago
Good point! I suppose it's a specific choice, but I would really like a clarification by the authors as well.
The standard mednext blocks are 3D counterparts of the 2D convnext (https://arxiv.org/pdf/2201.03545) architecture.
The first conv layer is a depthwise convolution with larger kernels and no interactions between channels. The second layer is not depthwise with a smaller kernel and allows for learning across channels. The design choice decouples kernel size from the size of the expansion layer - one can increase the expansion ratio without necessarily needing to do it for convolutions of size k. This would be a problem if the value of C is particularly high for that block.
Thank you for the clarification. Turns out, the convnext paper does have a paragraph on why fewer activations and even normalization layers. Seem to boil down to (small) improved performance.
Problem: A basic MedNeXT Block does not have activation between first conv block and the expansion conv block. Doesn't this mean - these two stacked conv+norm layers does not add any expressive power compared to a single conv+norm layer"?