Why pad the image into a shape whose sizes are multiple of 16 before put it in the prediction function

yianzhongguo commented 5 years ago

@FabianIsensee Hello, sir. I have been thinking about the question：why pad the image into a shape whose sizes are multiple of 16 before put it in the prediction function. I cannot figure it out. Could you help me, thank you!

FabianIsensee commented 5 years ago

Hi, this has to do with pooling. If the data is not divisible by 16 the following will happen throughout the network (say my image is 124x124x124 instead of 128x128x128) Encoder: 124x124x124 61x61x61 30x30x30 15x15x15 7x7x7x Now decoder: 14x14x14 28x28x28 56x56x56 112x112x112

As you can see, the feature map sizes do not match due to rounding (uneven numbers in encoder feature map size). This is undesired and will cause crashes (concat won't work, cannot compute loss because gt is still 124x124x124 etc). Thus, input size must be divisible by 2^num_pool (num_pool here is 4 so it's 16) for UNet like architectures. Hope this helps! Best, Fabian

yianzhongguo commented 5 years ago

Got it. Thank you very much!

MIC-DKFZ / BraTS2017

Why pad the image into a shape whose sizes are multiple of 16 before put it in the prediction function #28