cwmok / C2FViT

This is the official Pytorch implementation of "Affine Medical Image Registration with Coarse-to-Fine Vision Transformer" (CVPR 2022), written by Tony C. W. Mok and Albert C. S. Chung.
MIT License
131 stars 3 forks source link

Downsample image before feeding into network #14

Closed GrigoriPlatonov closed 5 months ago

GrigoriPlatonov commented 7 months ago

Thanks for your great work!

I have a question about the code. I notice that you use the half resolution image(128128128) to feed into the model to predict the affine matrix for original size image(256256256). However in the paper, you said highest resolution of the image pyramid should be the same with input image. Can you explain this to me? Thanks!

cwmok commented 7 months ago

Hi @GrigoriPlatonov,

We use half-resolution input for the network. This will significantly accelerate the training process without sacrificing registration accuracy. Yet, the evaluation is conducted with images with original size (256 x 256 x 256).

However, in the paper, you said the highest resolution of the image pyramid should be the same as the input image.

Yes, here, the input image refers to the half-resolution input.