IBBM / Cascaded-FCN

Source code for the MICCAI 2016 Paper "Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional NeuralNetworks and 3D Conditional Random Fields"
Other
304 stars 127 forks source link

Input image sizing #20

Closed mopladin closed 6 years ago

mopladin commented 7 years ago

Hello!

If I understand right your input images size is increased from 388x388 to 572x572, the same dimensions as in the U-Net publication by Olaf Ronneberger, by extending all sides with mirrored copies of the image.

But since the background of the CT scans has intensities near zero anyway, wouldn't it be possible to use zero padding instead and remain at the original size? The network should then be able to use less memory and to return a label map of the same size as the input, so I wonder if you found a reason not to use the zero padding that I have been missing so far.

mohamed-ezz commented 7 years ago

Hi,

You're right, you can compensate the lost boundary pixels using zero-padding, instead of mirroring. In that case the network will have to deal with the discontinuity in the input (at the boundary), which may or may not work as well as mirroring. We have not tested this.

If you will test this, let us know the results.

mopladin commented 7 years ago

Thanks for your answer!

As I realized only later, the zero padding of the filters would also effectively introduce zeros to the borders of the feature maps, so it would seem like it might cause problems throughout the network. But surprisingly enough there seems to be no penalty in the quality of results on the datasets that I tried it on so far.

There are two different MR datasets that I tried it on in a three-fold cross-validation each. In the first dataset, the filter-padding made it possible to use an input size of 256x256 instead of 572x572. The training then only took 30% of the time and used only 37% as much memory on the GPU. The result was curiously slightly better than when using the enlarged image, but it might just be a random variation. On the second dataset the size was reduced from 320x320 to 128x128 and the training then took about 45% of the previous time and 55% of the GPU memory. The results were slightly better or worse than before, depending on the epoch, and the difference was very small.

Maybe the feature maps are so sparse that the zero padding does not make a big difference, or alternatively the later layers adapt to the effects caused by the padded filters. In any case it seems worth it for faster experiments, although in the end I might try the best setup on the enlarged data without padded filters again just to be sure.