Kaiseem / DAR-UNet

[JBHI2022] A novel 3D unsupervised domain adaptation framework for cross-modality medical image segmentation
Apache License 2.0
40 stars 5 forks source link

unpaired T1 to T2 translation #2

Closed rik030 closed 6 months ago

rik030 commented 2 years ago

Dear Sir, My dataset is in nifti format and unpaired. T1 image has 512x512x120 dimension while T2 has 384x384x40 as dimension. How will the image-to-image translation takes place because the source and target have different dimension?

Kaiseem commented 2 years ago

Hi, for your questions, i suggest firstly spatial normalize T1 and T2 into the same spacing, and then pad the T2 images to T1 images' sizes of XY planes. Since in the image-to-image translation, you need to ensure the content is the same in the both domains (i.e., domian-invariant features), so that the styles can be disentangled (i.e., domain-specific features).

rik030 commented 2 years ago

Dear Sir, Spatial Normalization and padding take care of the 2D dimension. But since the 3D channels of both the modalities are different, how would the cycleGAN (Image to Image Translation) work?

Kaiseem commented 2 years ago

Hi, you can simply stack the 3D images along Z dimension. it is suboptimal but it is ok

rik030 commented 2 years ago

Dear Sir, I see that you have submitted a paper for CrossModa 2021. I am also using crossmoda dataset where the dataset is unpaired where the T1 images have 120 channels and T2 images have 40 channels. While training CycleGAN, is it trained in 2D or 3D architecture? Since each channel in modalities have its importance, how 120 channels in T1 and 40 channels in T2 are mapped in the image-to-image translation architecture?

Kaiseem commented 2 years ago

Yeah you are right. Basically, it is suboptimal to utilize a 2D image-to-image translation architecture to conduct 3D volume style transfer. However, it is reasonable and practical considering the large GPU memory cost of 3D image-to-image translation with the whole volumes. Rather than utilizing 3D patch style transfer, we found the 2D image style transfer with the whole images is better.

For your question, i will give three potential solution: 1) for T1 images with 120 channels, you can store 120 style vector and then interpolate them into 40 channels, thus align the style variance across channel. Then you can style transfer the T2 images with the corresponding style vector, enabling the style consistency between the two domians. 2) you can try to train a 3D image-to-image translation network directly. 3) you can directly style transfer the T2 images, and treat the style inconsistency across channels as the data augmentation.