Closed valentinogrean closed 3 years ago
Hey @valentinogrean,
sorry for the late reply.
Thank you for the great framework that you have shared with the community.
Thank you for these kind words. Happy to help.
Could you please provide some pointers or at least some documentation that I can check?
Sure, no problem.
In order to fully understand voxel spacings, it is helpful to have a look on the process of how e.g. CT images are created.
If a CT captures the image, it will result into the raw image matrix which is 'encoded' in a world/physical coordinate system. (Check out the fundamental concepts blog of SimpleITK) In order to only capture the relevant part of this raw image and normalize it a bit, only a subpart is selected and transformed into the image coordinate system. Important part: The resolution of the image coordinate system is fixed. (Mostly something like 512x512x_)
Therefore, a single pixel (or voxel in a 3D space) represents a compressed part of the original image. If the original image is bigger, than the compressed part is logically also bigger. -> The sizes of this compression ratio is defined as the voxel spacing / pixel spacing / pixel space / pixel or voxel size / space ratio.
Due to this resizing of the original image subpart into the 512x512x_ resolution, the image can be somekind of stretched or compressed in the Z-Axis if you directly compare resulting CT scans with each other. The reason for this is exactly that the original images have different sizes.
Neural Networks have big difficulties in learning patterns if the images have not a common voxel spacing. Therefore, it is required to normalize all samples to common voxel spacing. This can be achieved via resizing the images accordingly by computing a unique new shape for each image.
Let's have a look on the Python code for the MIScnn subfunction resampling:
# Calculate spacing ratio
ratio = current_spacing / np.array(self.new_spacing)
# Calculate new shape
new_shape = tuple(np.floor(img_data.shape[0:-1] * ratio).astype(int))
# Resize imaging data
img_data, seg_data = augment_resize(img_data, seg_data, new_shape,
order=3, order_seg=1, cval_seg=0)
for example ones similar to my used dataset or (1,1,1) seem to fall out completely and greatly slow the preprocessing phase
We have to understand here, that depending on our desired voxel spacing, all images are resized which lead to either an increase or decrease of image sizes.
If the new voxel spacing is bigger than the current spacing -> The resampled image gets smaller (more compressed) If the new voxel spacing is smaller than the current spacing -> The resampled image gets bigger (more into the raw shape)
Therefore a (1,1,1) resampling most probably leads to very big images which increases runtime. (And also required GPU memory if a fullimage analysis approach is selected).
Via experimenting... :D
The best way to go is to compute the median images sizes of your dataset after the resampling. Then you should aim at that your patch size should be 1/4 up to 1/64 of this median image size of your dataset.
Therefore my personal approach is:
The perfect voxel spacing is highly depended on your dataset or task. In my personal experience something a 1/8 ratio between patch shape and resampled median image shape works best.
Check out here: https://github.com/frankkramer-lab/covid19.MIScnn/blob/master/scripts/utils/identify_resamplingShape.py
Yes and no. Given a fixed ratio like 1/8, it is possible to automatically compute it and provide a suited voxel spacing. Is this approach always a good way to go for every dataset: No.
But I understand that the resampling concept can be quite difficult. An automatic computation of suited voxel spacings as some kind of preprocessing script is on my to-do agenda :)
Hope that I was able to help you.
Cheers, Dominik
Wow! Thank you for the effort put into providing such a detailed answer. Kudos.
Some topics from my side:
Hi!
Thank you so much for developing the MIScnn framework. The clean code and detailed documentation are very much appreciated :).
I'm not sure if I'm posting this question at the right location, but because it's directly linked to @muellerdo's previous answer, I'll give it a go.
From what I understood, to find the best voxel spacing, I must first choose a suitable patch size.
I increase my patch size to the maximum according to the GPU VRAM (160x160x80 for U-Net standard with 16GB VRAM)
What formula / rule do you use to infer your patch size from your GPU VRAM? In my case, I have an Nvidia RTX 2080 Super (with 8GB GDDR6 as VRAM) and I want to segment the trapezium bone from hand CT scans (original data = 512 x 512 with slices from 378 to 800) using U-Net. What would you recommend?
In addition, how do you find the correct patch overlap size? Is it only through experimentation? Is there any rule of thumb I could use?
Last question, how do you choose the batch size? I saw you used a size 2 in your examples and I know it depends on your GPU and model size, but how can I compute it precisely for my own dataset / model?
Thank you for sharing your expertise! Cheers,
Diane
Hello,
Thank you for the great framework that you have shared with the community.
I have a question regarding the resampling: sf_resample = Resampling((3.22, 1.62, 1.62)) seems to do wonders with regards to the speed of processing and with the results; other resampling values (for example ones similar to my used dataset or (1,1,1) seem to fall out completely and greatly slow the preprocessing phase). I know that you briefly described it in https://github.com/frankkramer-lab/MIScnn/blob/master/examples/LCTSC_DICOMInterface.ipynb but it is still too cryptic for me. Could you please provide some pointers or at least some documentation that I can check?
Thanks a lot.
For the record, I am using your framework with a multi-class CT dataset (with moderate results currently; but still working on improving it)