Question regarding resampling

Hello,

Thank you for the great framework that you have shared with the community.

I have a question regarding the resampling: sf_resample = Resampling((3.22, 1.62, 1.62)) seems to do wonders with regards to the speed of processing and with the results; other resampling values (for example ones similar to my used dataset or (1,1,1) seem to fall out completely and greatly slow the preprocessing phase). I know that you briefly described it in https://github.com/frankkramer-lab/MIScnn/blob/master/examples/LCTSC_DICOMInterface.ipynb but it is still too cryptic for me. Could you please provide some pointers or at least some documentation that I can check?

Thanks a lot.

For the record, I am using your framework with a multi-class CT dataset (with moderate results currently; but still working on improving it)

Hey @valentinogrean,

sorry for the late reply.

Thank you for the great framework that you have shared with the community.

Thank you for these kind words. Happy to help.

Could you please provide some pointers or at least some documentation that I can check?

Sure, no problem.

In order to fully understand voxel spacings, it is helpful to have a look on the process of how e.g. CT images are created.

If a CT captures the image, it will result into the raw image matrix which is 'encoded' in a world/physical coordinate system. (Check out the fundamental concepts blog of SimpleITK) In order to only capture the relevant part of this raw image and normalize it a bit, only a subpart is selected and transformed into the image coordinate system. Important part: The resolution of the image coordinate system is fixed. (Mostly something like 512x512x_)

Therefore, a single pixel (or voxel in a 3D space) represents a compressed part of the original image. If the original image is bigger, than the compressed part is logically also bigger. -> The sizes of this compression ratio is defined as the voxel spacing / pixel spacing / pixel space / pixel or voxel size / space ratio.

Now, the problem arises:

Due to this resizing of the original image subpart into the 512x512x_ resolution, the image can be somekind of stretched or compressed in the Z-Axis if you directly compare resulting CT scans with each other. The reason for this is exactly that the original images have different sizes.

Neural Networks have big difficulties in learning patterns if the images have not a common voxel spacing. Therefore, it is required to normalize all samples to common voxel spacing. This can be achieved via resizing the images accordingly by computing a unique new shape for each image.

How does this work?

Let's have a look on the Python code for the MIScnn subfunction resampling:

        # Calculate spacing ratio
        ratio = current_spacing / np.array(self.new_spacing)

        # Calculate new shape
        new_shape = tuple(np.floor(img_data.shape[0:-1] * ratio).astype(int))

        # Resize imaging data
        img_data, seg_data = augment_resize(img_data, seg_data, new_shape,
                                            order=3, order_seg=1, cval_seg=0)

We utilize the current spacing of the image and divide it with the desired spacing (e.g. 1x1x1)
The resulting ratio have to be applied on the current shape of the image in order to obtain the new shape
In the final step, the image have to be resized to the new computed shape

Voxel Spacing

for example ones similar to my used dataset or (1,1,1) seem to fall out completely and greatly slow the preprocessing phase

We have to understand here, that depending on our desired voxel spacing, all images are resized which lead to either an increase or decrease of image sizes.

If the new voxel spacing is bigger than the current spacing -> The resampled image gets smaller (more compressed) If the new voxel spacing is smaller than the current spacing -> The resampled image gets bigger (more into the raw shape)

Therefore a (1,1,1) resampling most probably leads to very big images which increases runtime. (And also required GPU memory if a fullimage analysis approach is selected).

How do we know which voxel spacing is the best?

Via experimenting... :D

The best way to go is to compute the median images sizes of your dataset after the resampling. Then you should aim at that your patch size should be 1/4 up to 1/64 of this median image size of your dataset.

Therefore my personal approach is:

I increase my patch size to the maximum according to the GPU VRAM (160x160x80 for U-Net standard with 16GB VRAM)
Then I adjust the voxel spacing in order to get into this 1/4 or 1/8 up to 1/64 range

The perfect voxel spacing is highly depended on your dataset or task. In my personal experience something a 1/8 ratio between patch shape and resampled median image shape works best.

How do we find out the image size after resampling with MIScnn?

Check out here: https://github.com/frankkramer-lab/covid19.MIScnn/blob/master/scripts/utils/identify_resamplingShape.py

But why do we have to deal with all this? Can't you just automatically compute the perfect ratio?

Yes and no. Given a fixed ratio like 1/8, it is possible to automatically compute it and provide a suited voxel spacing. Is this approach always a good way to go for every dataset: No.

But I understand that the resampling concept can be quite difficult. An automatic computation of suited voxel spacings as some kind of preprocessing script is on my to-do agenda :)

Hope that I was able to help you.

Cheers, Dominik

Wow! Thank you for the effort put into providing such a detailed answer. Kudos.

Some topics from my side:

I really suggest you take this nice answer an make a wiki page out of it. Simply copy-pasting would do the trick, as it is already nicely structured. I am sure it will help others like me.
I am starting to understand that DL involves a lot of empirical activities. "Via experimenting" :). It's not my cup of tea to be honest, but this is the way it is. I was experimenting myself, with a dataset and got to the conclusion that batch size 2 and batch size 64, 144, 144 works best for me (I have only 8GB GPU) - all other combinations like 1 batch 96, 176, 176 or 4 batches 48, 112, 112 which occupy same memory size give much worse results. All articles say that increased batch size should yield better results, but now I believe the culprit was the fact that the spacing was not suited for 48, 112, 112 :)
As said, empirical is not really my way. So in the mean time I'm trying to understand better the theory. One free resource which helped me a lot was https://d2l.ai/. I put it here, maybe it will help other people. I know it's mxnet, but in the end it's not so much different than tensorflow and the information/theory is really useful.

Hi!

Thank you so much for developing the MIScnn framework. The clean code and detailed documentation are very much appreciated :).

I'm not sure if I'm posting this question at the right location, but because it's directly linked to @muellerdo's previous answer, I'll give it a go.

From what I understood, to find the best voxel spacing, I must first choose a suitable patch size.

I increase my patch size to the maximum according to the GPU VRAM (160x160x80 for U-Net standard with 16GB VRAM)

What formula / rule do you use to infer your patch size from your GPU VRAM? In my case, I have an Nvidia RTX 2080 Super (with 8GB GDDR6 as VRAM) and I want to segment the trapezium bone from hand CT scans (original data = 512 x 512 with slices from 378 to 800) using U-Net. What would you recommend?

In addition, how do you find the correct patch overlap size? Is it only through experimentation? Is there any rule of thumb I could use?

Last question, how do you choose the batch size? I saw you used a size 2 in your examples and I know it depends on your GPU and model size, but how can I compute it precisely for my own dataset / model?

Thank you for sharing your expertise! Cheers,

Diane

frankkramer-lab / MIScnn