Closed rekalantar closed 2 years ago
Hi @rekalantar ,
In the tutorial, we set num_samples=4
in RandCropByPosNegLabeld
, so it will crop out 4 patches at 1 time.
@ahatamiz Could you please help share more comments about the UNETR question?
Thanks in advance.
Hi @rekalantar
Thanks for your interest in our work. We first sample inputs of size (96,96,96) from the entire volume and then utilize (16,16,16) non-overlapping patches from each sample. This process is similar to general ImageNet computer vision wherein images of different sizes are first resized to (256,256) and then center-cropped.
On another note, using the entire imaging volume is not feasible in most cases due to memory constrains.
Thanks
Great thank you for your response.
I wonder if applying light embeddings and/or dilated convolutions cross 96x96x96 patches would help at all. Perhaps attention modules could become lighter or replaced by separable convolutions to avoid memory overshoot.
In any case, exciting project. keep up the good work!
Hi @rekalantar
I believe making the attention modules lighter/more efficient is a promising direction.
Thanks
Hi Ali, thank you for sharing this great work.
I had a question regarding the image and patch sizes. I noticed that UNETR uses non-overlapping sub-patches of size 16 from patches of 96 which are randomly selected from the image volume. In this case is it fair to say that the network is still not able to take into account the entire imaging volume? From the tutorial my understanding is that 'RandCropByPosNegLabeld' only picks out one patch at a time? I would appreciate it if you please provide an explanation for this.
Thanks!