UNETR using random 96x96x96 patches and non-overlapping 16x16x16 sub-patches?

Project-MONAI / tutorials

MONAI Tutorials

https://monai.io/started.html

Apache License 2.0

1.77k stars 668 forks source link

UNETR using random 96x96x96 patches and non-overlapping 16x16x16 sub-patches? #415

Closed rekalantar closed 2 years ago

rekalantar commented 2 years ago

Hi Ali, thank you for sharing this great work.

I had a question regarding the image and patch sizes. I noticed that UNETR uses non-overlapping sub-patches of size 16 from patches of 96 which are randomly selected from the image volume. In this case is it fair to say that the network is still not able to take into account the entire imaging volume? From the tutorial my understanding is that 'RandCropByPosNegLabeld' only picks out one patch at a time? I would appreciate it if you please provide an explanation for this.

Thanks!

Nic-Ma commented 2 years ago

Hi @rekalantar ,

In the tutorial, we set num_samples=4 in RandCropByPosNegLabeld, so it will crop out 4 patches at 1 time. @ahatamiz Could you please help share more comments about the UNETR question?

Thanks in advance.

ahatamiz commented 2 years ago

Hi @rekalantar

Thanks for your interest in our work. We first sample inputs of size (96,96,96) from the entire volume and then utilize (16,16,16) non-overlapping patches from each sample. This process is similar to general ImageNet computer vision wherein images of different sizes are first resized to (256,256) and then center-cropped.

On another note, using the entire imaging volume is not feasible in most cases due to memory constrains.

Thanks

rekalantar commented 2 years ago

Great thank you for your response.

I wonder if applying light embeddings and/or dilated convolutions cross 96x96x96 patches would help at all. Perhaps attention modules could become lighter or replaced by separable convolutions to avoid memory overshoot.

In any case, exciting project. keep up the good work!

ahatamiz commented 2 years ago

Hi @rekalantar

I believe making the attention modules lighter/more efficient is a promising direction.

Thanks