cwmok / C2FViT

This is the official Pytorch implementation of "Affine Medical Image Registration with Coarse-to-Fine Vision Transformer" (CVPR 2022), written by Tony C. W. Mok and Albert C. S. Chung.
MIT License
138 stars 5 forks source link

problems about affine registration and deformable registration #7

Closed celi7 closed 1 year ago

celi7 commented 1 year ago

Hi, thanks for this awesome work and your code! My registration task is divided into two parts. The first step is to perform rough affine registration, and the second step is to perform deformable registration. These two steps are implemented using independent affine registration network and deformable registration network, respectively. When the test set has not undergone any processing, the dice is 0.67. When using affine registration network for individual training and testing, the test set dice is 0.76. When using deformable network for individual training and testing, the test set dice is 0.82. I believe that using the dataset after affine registration as the training set for deformable network training should yield better results than using two networks alone. But I encountered an incomprehensible problem. After training my affine registration network, I performed affine registration on the dataset, and from the image, it showed some effect. The test set Dice was able to reach 0.76. However, when I used these affine registered data as the training set for deformable network, deformable networkt failed to train. When I trained a deformable network, the test set dice actually increased from the original dice value of 0.67 instead of 0.76. I have checked the training and testing sets of my deformable network and found no issues. My question is 1. In your experience, why is this situation happening? 2. Will affine registration on the dataset affect the subsequent training of deformable registration network? 3. How much does pre affine registration improve the deformable registration effect when there is significant deformation between registered image pairs?

cwmok commented 1 year ago

Hi @celi7,

Could you provided more detail about your question?

However, when I used these affine registered data as the training set for deformable network, deformable network failed to train.

What do you mean "the network failed to train"? To my knowledge, most deformable networks are trained on affine/rigid registered dataset.

  1. In your experience, why is this situation happening?

I suggest you first compute the initial dice of your original/affine pre-registered dataset. The dice score of the test set should be increased from 0.76, not 0.67, for the deformable network. I suspect that you are in the case that you didn't use the affine-registered dataset to train your network.

  1. Will affine registration on the dataset affect the subsequent training of deformable registration network?

Yes.

  1. How much does pre affine registration improve the deformable registration effect when there is significant deformation between registered image pairs?

In short, pre-affine registration has a great impact on the registration when there is significant (linear/affine) deformation. It is because of the high degree of freedom of the deformation field and smoothness regularization integrated in the training of deformable network.

celi7 commented 1 year ago

@cwmok , Thank you very much for your timely reply!

  1. The meaning that the network cannot be trained is that when I use my affine registration network to perform affine registration on the training and testing sets, it is saved as an npz files for later use. However, when I tried to use these registered data sets (npz files) as the training set and test set of the deformable registration network, the deformable registration network could not be trained normally (it was shown that dice was actually decreasing during training, and it began to decrease from the original data dice: 0.67). I have checked the data and cannot find the problem. For this issue, I tried another approach today: instead of saving the affine registration results as npz files, I directly input the affine registration results into the deformable registration network for training. At this point, the deformable registration network can be trained normally, and the dice increases from 0.76 during training. So I suspect there was a problem saving as npz files before, which caused data errors.
  2. The biggest challenge of my task at present is that there are significant differences between different registration pairs: some registration pairs have ROI positions that are far apart, while others have ROI positions that are very close, making it difficult for my deformable registration network to handle registration pairs with long ROI distances. So I thought of affine registration first, but my affine registration network (a simpler Convolutional neural network) cannot solve this problem while training either. I am currently trying to perform affine registration first and then deformable registration. I wonder if this can solve the problem of some registration pairs with long ROI distances? But my concern is that even if affine registration is performed first, the deformable registration network will still not be able to effectively train registration pairs with long ROI distances (I previously tried to reduce the batch size when training to a certain stage, but it was useless)
cwmok commented 1 year ago

 So I thought of affine registration first, but my affine registration network (a simpler Convolutional neural network) cannot solve this problem while training either.

For images with different ROI, you may need to define a mask to mask the similarity function during training. For more details, please refer to https://simpleelastix.readthedocs.io/Introduction.html#masks.

But my concern is that even if affine registration is performed first, the deformable registration network will still not be able to effectively train registration pairs with long ROI distances.

Yes. Therefore, after affine registration, you should crop the input images to a smaller mutual patch in order to reduce the input image size for the deformable network. Alternatively, you may consider downsampling your input images as well and upsampling the predicted deformation field to warp the original image.

celi7 commented 1 year ago

Thank you for your suggestions! I have some doubts:

Yes. Therefore, after affine registration, you should crop the input images to a smaller mutual patch in order to reduce the input image size for the deformable network.

  1. Does “crop the input images to a smaller mutual patch in order to reduce the input image size for the deformable network” mean to crop the images to maximize ROI? The problem is that in my data, different images have different ROI sizes and are located at different positions in the images. In this case, even if the image is cropped (such as when I cropped a 256 256 image to 144 160), there are still significant differences in ROI size and location among different images.

Alternatively, you may consider downsampling your input images as well and upsampling the predicted deformation field to warp the original image.

  1. Does “downsampling your input images as well and upsampling the predicted deformation field to warp the original image” mean to use the downsampling and upsampling designed in deformable registration networks? I don't quite understand this. Deformable registration networks have downsampling and upsampling already, however, the network still cannot solve the problem of large deformation Image registration.
cwmok commented 1 year ago

Does “crop the input images to a smaller mutual patch in order to reduce the input image size for the deformable network” mean to crop the images to maximize ROI? The problem is that in my data, different images have different ROI sizes and are located at different positions in the images. In this case, even if the image is cropped (such as when I cropped a 256 256 image to 144 160), there are still significant differences in ROI size and location among different images.

Usually, when images show different FOV, I will crop the moving image to match the Field of View of fixed image after affine registration. For example, if the moving image is a full-body CT scan and the fixed image is an abdominal CT scan. I will crop the moving to abdominal region to match the FOV of fixed image after affine registration.

Does “downsampling your input images as well and upsampling the predicted deformation field to warp the original image” mean to use the downsampling and upsampling designed in deformable registration networks?

No. I mean to downsampling the resolution of the input images (mm). For example, a regular 1mm^3 brain scan has an image size (256, 256, 256), which is very large for a single GPU machine. Then, you might first downsample the 1mm^3 scan to 2mm^3 scan such that the image size of the fixed and moving images becomes (128, 128, 128). Then, the deformable network learns to predict a (128, 128, 128, 3) displacement vector field and you can upsample it to (256, 256, 256, 3) to warp the original image. Learning with (128, 128, 128) input use 8x less GPU memory than (256, 256, 256) input during training.

celi7 commented 1 year ago

@cwmok Alright, thank you. I understand! Also, I have a question. As I mentioned earlier, I need to do the registration of T1 weighted cardiac images. However, the distribution of my dataset is not very uniform, that is, some registration pairs have ROI positions that are far apart from each other, while others have ROI positions that are closer to each other. This leads to the inability of the affine registration network to effectively solve the registration problem when the ROI positions of the two images in some registration pairs are far apart. So I am wondering if it is possible to use the similarity of registration pairs as a criterion to design the entire affine registration network as two subnets in parallel. When the similarity is high, input the registration pair into the first subnet for training; When the similarity is low, input the registration pair into the second subnet for training. In theory, is this method possible to solve the registration problem when the ROI positions of two images in the previous dataset are far apart?

cwmok commented 1 year ago

Hi @celi7,

It depends on the initial misalignment of the input images. I think your method is reasonable and may solve the problem. Yet, if the initial position of the input images has no overlapping region with each other, gradient-based methods may fail. In that case, you may want to try keypoint-based affine registration method such as KeyMorph and its GitHub. keypoint-based methods will be more robust to the large initial misalignment.

celi7 commented 1 year ago

Hi @cwmok , Thank you for your suggestion! I am thinking if I have a myocardial segmentation mask, can I use it to supervise the affine registration network? How do you think this segmentation supervised affine registration method performs compared to the keypoint-based affine registration method you mentioned?

cwmok commented 1 year ago

Hi @celi7,

Yes, you can use the segmentation mask to supervise the training. See https://github.com/cwmok/C2FViT/blob/main/Code/Train_C2FViT_pairwise_semi.py for more details.

It depends. If the initial segmentation mask of the input image pairs has no overlapping region. keypoint-based affine registration may be still better. If not, a supervised gradient-based affine registration method will be better.

celi7 commented 1 year ago

Hi @cwmok , Okay, I understand that. Thank you for your very detailed and patient answer! Wishing you success in your research!