nnUNet: plan.json for downstream trainings

Luffy03 / Large-Scale-Medical

[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation

Apache License 2.0

100 stars 7 forks source link

nnUNet: plan.json for downstream trainings #15

Open lorinczszabolcs opened 3 weeks ago

lorinczszabolcs commented 3 weeks ago

Hi!

First of all, awesome work, thanks for the contribution!

I wanted to ask if you would be able to provide the plans.json, dataset.json, etc. generated configuruation files for nnUNet that were created throughout the pre-training step, as the weights by themselves are not usable straight away for the fine-tuning on downstream tasks, at the moment if I'm not mistaken we would need to do the plan and preprocess step on the pre-training data set, but that requires quite a lot of space and computation.

Thanks again, and looking forward to your response!

Luffy03 commented 3 weeks ago

Hi, many thanks for your attention to our work! For ssl pre-training, we also use the monai framework to train the nnunet backbone, thus the plans.json and dataset.json are not required. If you want to try fully-supervised training on VoComni, you can check the files at https://github.com/Luffy03/Large-Scale-Medical/tree/main/VoComni/nnUNet_preprocessed. Hope they can help.

lorinczszabolcs commented 3 weeks ago

Thanks for the quick answer!

I'm mainly interested in using the SSL-pre-trained weights for fine-tuning on downstream tasks (using the nnUNet codebase). For that the plans and dataset.json are required if we want to initialize the model instance, for which we can load the shared weights. Or is there a class in the repository that has some hard-coded architecture implemented already that we could use for loading the pre-trained weights? Maybe it's this one?

Luffy03 commented 3 weeks ago

Oh, you mean the downstream. Please refer to Downstream/nnUNet, where I provide the usage instructions. Detailed codes at nnUNetTrainer_pretrain.py. You only need to use the original preprocessing implementations then employ "nnUNetv2_train xxx 3d_fullres all -tr nnUNetTrainer_pre" instead of "nnUNetv2_train xxx 3d_fullres all -tr nnUNetTrainer".

cd nnUNet
source activate YOUR-CONDA-ENVIRONMENT
nnUNetv2_plan_and_preprocess -d xxx -c 3d_fullres --verbose --verify_dataset_integrity
nnUNetv2_train xxx 3d_fullres all -tr nnUNetTrainer_pre

lorinczszabolcs commented 3 weeks ago

Thank you for getting back!

I saw the instructions there, but I was wondering if we could skip the nnUNetv2_plan_and_preprocess step, as in the original nnUNet repository ExperimentPlanner creates a new architecture that is specific to each data set that is being used.

I now checked and realized that the architecture is fixed here in this repository, so I assume I can just run plan_and_preprocess on my downstream task data set and then load the weights that were shared, because the architecture won't differ due to this :). But please feel free to correct me if I'm wrong.

The only downside is that it is not compatible with original nnUNet implementation anymore with the ExperimentPlanner overrided.

Luffy03 commented 3 weeks ago

Hi, as I know the default Experimentplanner will not change the architecture of PlainConvUNet, so you can still inherit the pre-trained weights. The only difference is the pre-process strategies. You kindly mentioned that "ExperimentPlanner creates a new architecture that is specific to each data set", is this a recent update?

lorinczszabolcs commented 3 weeks ago

Hi!

It has always been like this as far as I know, that's what set nnUNet apart from other models, that it is self-configuring (in terms of both preprocessing and network architecture configuration, given some GPU constraint) Here is a short quote from their README:

Rule-based parameters use the dataset fingerprint to adapt certain segmentation pipeline properties by following hard-coded heuristic rules. For example, the network topology (pooling behavior and depth of the network architecture) are adapted to the patch size; the patch size, network topology and batch size are optimized jointly given some GPU memory constraint.

And the relevant part in their code that does the network topology adaptations is here. I just realized that this repository also has that logic implemented, here, in a somewhat different place, possibly due to using a different version of the original codebase.

Luffy03 commented 3 weeks ago

Yes, some parameters like patch_size will change but the network parameters will not change. You can try our codes and you will find that most of the network parameters are loaded. The data processing will be different, we are still exploring this part.

lorinczszabolcs commented 3 weeks ago

Ok, thanks! Will get back once I was able to try it out.

Luffy03 commented 3 weeks ago

Hi, FYI, the codes https://github.com/Luffy03/Large-Scale-Medical/blob/ab6829393c6a7e7623213406abc7340e0b8f45a8/Downstream/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer_pretrain.py#L135 will print the loaded network paremeters.