chinmay5 / vit_ae_plus_plus

Code base for the paper ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations
Creative Commons Zero v1.0 Universal
40 stars 4 forks source link

3D extension of Moco-v3 #2

Closed JSGrondin closed 1 year ago

JSGrondin commented 1 year ago

Hi, Thank you very much for your contribution of this repo and the accompanying paper. In the paper, you mention that you extended the ViT to be 3D architecture as the encoder. I suppose that you used that 3D version of ViT in both the vanilla and the ViT-AE++ when evaluating this method on 3D datasets? I was curious to know how you adapted the MoCo-v3 and SimSiam to also incorporate a 3D encoder, did you use that same 3D ViT? This is not part of the repo at the moment, but is there any chance you will contribute that part at some point? Many thanks!

chinmay5 commented 1 year ago

Thank you so much for your interest in our work. Indeed we are using a 3D ViT as the encoder in our evaluation. The major difference is in its patch embedding layer which needs to be modified to work with the 3d dataset. We have uploaded the code snippet used for working with the MoCoV3 model. You can execute it using

python -m other_baselines.mocov3.main_3d_moco_k_fold -a vit_3d

Please change the other parameters accordingly.

JSGrondin commented 1 year ago

Many thanks @chinmay5 !