Open jsadu826 opened 3 months ago
Hi, thanks for your attention to our work!
For CC-CCII data pre-processing, for each scan of each patient, are all the PNGs stacked to get a 3D npy volume with shape (n_slices, n_channels, height, width)? Do weed need to select only the lesion slices? When training, are the volumes resized to (n_slices, n_channels, 256, 256) and directly sent to the model, without further cropping into fixed-size ROIs (e.g. 64x64x64) as in the segmentation tasks?
We don't need to select the lesion slice. Instead, we input 3D volume to train a 3D network. Please refer to "https://github.com/Luffy03/VoCo/blob/main/Finetune/CC-CCII/utils/data_utils.py" for details.
How is the BraTS 2021 dataset preprocessed in training and testing and which modality (t1, t1ce, t2, flair) is used.
Hi, we used all of four modalities.
So the model input has 4 channels?
Yes, exactly. And we don't load the first layer of our pre-trained models.
Great thanks!
Also, could you please share the train-valid-test json for BraTS21? Thank you~
Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.
So VoCo used 5-fold crossvalidation for BraTS21, as in https://arxiv.org/pdf/2201.01266
Yes, you can have a try. In my experiments, the results of different folds are approximate, since BraTS21 is already with adequate cases . I notice that some previous works report the results of the first fold, maybe 5-fold is not an essential setting.
Yes, exactly. And we don't load the first layer of our pre-trained models.
Does it mean not loading swinViT.patch_embed.proj.weight, encoder1.layer.conv1.conv.weight, and encoder1.layer.conv3.conv.weight?
Hi, you can use this code to check.
def load(model, model_dict):
if "state_dict" in model_dict.keys():
state_dict = model_dict["state_dict"]
elif "network_weights" in model_dict.keys():
state_dict = model_dict["network_weights"]
elif "net" in model_dict.keys():
state_dict = model_dict["net"]
else:
state_dict = model_dict
if "module." in list(state_dict.keys())[0]:
print("Tag 'module.' found in state dict - fixing!")
for key in list(state_dict.keys()):
state_dict[key.replace("module.", "")] = state_dict.pop(key)
if "backbone." in list(state_dict.keys())[0]:
print("Tag 'backbone.' found in state dict - fixing!")
for key in list(state_dict.keys()):
state_dict[key.replace("backbone.", "")] = state_dict.pop(key)
if "swin_vit" in list(state_dict.keys())[0]:
print("Tag 'swin_vit' found in state dict - fixing!")
for key in list(state_dict.keys()):
state_dict[key.replace("swin_vit", "swinViT")] = state_dict.pop(key)
current_model_dict = model.state_dict()
new_state_dict = {
k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
for k in current_model_dict.keys()}
model.load_state_dict(new_state_dict, strict=True)
print("Using VoCo pretrained backbone weights !!!!!!!")
return model
Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.
Hi, I'd like to know why the BraTS21 results obtained by training SwinUNETR from scratch are much higher in SwinUNETR than in VoCo, especially the enhanced tumor. These two works seems to have used the same data split.
In SwinUNETR paper:
In VoCo paper:
Hi, it is a mistake in our previous implementation :worried: since we inherited the implementation of swinunetr's official codes in the cvpr version. Our current version can achieve higher performance and we will release it recently. By the way, can you reproduce the results as the swinunetr reported? We cannot reproduce it.
I copied the data split from this IEEE TMI paper, which splits the 1251 BraTS21 training data into train/valid/test = 833/209/209. The training code is based on this. The dices on the testing set were approximately: TC = 90, WT = 93, ET = 86 (almost same for both finetuning on VoCo and training from scratch, but finetuning on VoCo indeed converged much much faster).
Thanks for sharing, encouraging to hear that. Our reproduced result is about 91% Dice but it is not based on this. Seems your results are also good. We also find that VoCo did not improve significantly in this dataset (less than 2%). Maybe it is caused by the modality gap.
Dear researchers, our work is now available at Large-Scale-Medical, if you are still interested in this topic. Thank you very much for your attention to our work, it does encourage me a lot!
Hello! I'm trying to reproduce the result on CC-CCII and MM-WHS.
Thank you!