About training-testing strategies of downstream datasets

jsadu826 commented 3 months ago

Hello! I'm trying to reproduce the result on CC-CCII and MM-WHS.

For CC-CCII, what is the training-validation-testing setting? Is it a 3-fold cross-validation?
For MM-WHS, is the testing done by running the matlab script (matlab_evaluation_script.m) provided by the authors on the encrypted testing labels?

Thank you!

Luffy03 commented 3 months ago

Hi, thanks for your attention to our work!

For CC-CCII, you can three-fold validation and report the mean results. In my experiments, the results of three folds are almost the approximate. You can have a try.
For WHS, since most of the existing works did not report their test results. For convenience, you can also report 5-fold validation results on the training set.

jsadu826 commented 2 months ago

For CC-CCII data pre-processing, for each scan of each patient, are all the PNGs stacked to get a 3D npy volume with shape (n_slices, n_channels, height, width)? Do weed need to select only the lesion slices? When training, are the volumes resized to (n_slices, n_channels, 256, 256) and directly sent to the model, without further cropping into fixed-size ROIs (e.g. 64x64x64) as in the segmentation tasks?

Luffy03 commented 2 months ago

We don't need to select the lesion slice. Instead, we input 3D volume to train a 3D network. Please refer to "https://github.com/Luffy03/VoCo/blob/main/Finetune/CC-CCII/utils/data_utils.py" for details.

jsadu826 commented 2 months ago

How is the BraTS 2021 dataset preprocessed in training and testing and which modality (t1, t1ce, t2, flair) is used.

Luffy03 commented 2 months ago

Hi, we used all of four modalities.

jsadu826 commented 2 months ago

So the model input has 4 channels?

Luffy03 commented 2 months ago

Yes, exactly. And we don't load the first layer of our pre-trained models.

jsadu826 commented 2 months ago

Great thanks!

jsadu826 commented 1 month ago

Also, could you please share the train-valid-test json for BraTS21? Thank you~

Luffy03 commented 1 month ago

Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.

jsadu826 commented 1 month ago

So VoCo used 5-fold crossvalidation for BraTS21, as in https://arxiv.org/pdf/2201.01266

Luffy03 commented 1 month ago

Yes, you can have a try. In my experiments, the results of different folds are approximate, since BraTS21 is already with adequate cases . I notice that some previous works report the results of the first fold, maybe 5-fold is not an essential setting.

jsadu826 commented 1 month ago

Yes, exactly. And we don't load the first layer of our pre-trained models.

Does it mean not loading swinViT.patch_embed.proj.weight, encoder1.layer.conv1.conv.weight, and encoder1.layer.conv3.conv.weight?

Luffy03 commented 1 month ago

Hi, you can use this code to check.

def load(model, model_dict):
    if "state_dict" in model_dict.keys():
        state_dict = model_dict["state_dict"]
    elif "network_weights" in model_dict.keys():
        state_dict = model_dict["network_weights"]
    elif "net" in model_dict.keys():
        state_dict = model_dict["net"]
    else:
        state_dict = model_dict

    if "module." in list(state_dict.keys())[0]:
        print("Tag 'module.' found in state dict - fixing!")
        for key in list(state_dict.keys()):
            state_dict[key.replace("module.", "")] = state_dict.pop(key)

    if "backbone." in list(state_dict.keys())[0]:
        print("Tag 'backbone.' found in state dict - fixing!")
    for key in list(state_dict.keys()):
        state_dict[key.replace("backbone.", "")] = state_dict.pop(key)

    if "swin_vit" in list(state_dict.keys())[0]:
        print("Tag 'swin_vit' found in state dict - fixing!")
        for key in list(state_dict.keys()):
            state_dict[key.replace("swin_vit", "swinViT")] = state_dict.pop(key)

    current_model_dict = model.state_dict()
    new_state_dict = {
        k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
        for k in current_model_dict.keys()}

    model.load_state_dict(new_state_dict, strict=True)
    print("Using VoCo pretrained backbone weights !!!!!!!")

    return model

jsadu826 commented 1 month ago

Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.

Hi, I'd like to know why the BraTS21 results obtained by training SwinUNETR from scratch are much higher in SwinUNETR than in VoCo, especially the enhanced tumor. These two works seems to have used the same data split.

In SwinUNETR paper:

In VoCo paper:

Luffy03 commented 1 month ago

Hi, it is a mistake in our previous implementation :worried: since we inherited the implementation of swinunetr's official codes in the cvpr version. Our current version can achieve higher performance and we will release it recently. By the way, can you reproduce the results as the swinunetr reported? We cannot reproduce it.

jsadu826 commented 1 month ago

I copied the data split from this IEEE TMI paper, which splits the 1251 BraTS21 training data into train/valid/test = 833/209/209. The training code is based on this. The dices on the testing set were approximately: TC = 90, WT = 93, ET = 86 (almost same for both finetuning on VoCo and training from scratch, but finetuning on VoCo indeed converged much much faster).

Luffy03 commented 1 month ago

Thanks for sharing, encouraging to hear that. Our reproduced result is about 91% Dice but it is not based on this. Seems your results are also good. We also find that VoCo did not improve significantly in this dataset (less than 2%). Maybe it is caused by the modality gap.

Luffy03 commented 1 week ago

Dear researchers, our work is now available at Large-Scale-Medical, if you are still interested in this topic. Thank you very much for your attention to our work, it does encourage me a lot!

Luffy03 / VoCo

About training-testing strategies of downstream datasets #19