Closed yianzhongguo closed 5 years ago
Hi, this has to do with pooling. If the data is not divisible by 16 the following will happen throughout the network (say my image is 124x124x124 instead of 128x128x128) Encoder: 124x124x124 61x61x61 30x30x30 15x15x15 7x7x7x Now decoder: 14x14x14 28x28x28 56x56x56 112x112x112
As you can see, the feature map sizes do not match due to rounding (uneven numbers in encoder feature map size). This is undesired and will cause crashes (concat won't work, cannot compute loss because gt is still 124x124x124 etc). Thus, input size must be divisible by 2^num_pool (num_pool here is 4 so it's 16) for UNet like architectures. Hope this helps! Best, Fabian
Got it. Thank you very much!
@FabianIsensee Hello, sir. I have been thinking about the question:why pad the image into a shape whose sizes are multiple of 16 before put it in the prediction function. I cannot figure it out. Could you help me, thank you!