MIC-DKFZ / nnUNet

Apache License 2.0
5.91k stars 1.76k forks source link

configuration after early stopping with nnunetv2 #2459

Closed bkonk closed 2 months ago

bkonk commented 2 months ago

I'm a former nnunetv1 user now using v2 to train a 2d and 3d_fullres seg model. I stopped training early, and am trying to figure out how to find the best ensemble configuration. Formerly, I think this was just a matter of copying checkpoint_best.pth to checkpointfinal.pth in each fold directory. It looks like that has changed (or I'm forgetting a step) and now there is a validation folder within each fold# folder that is expected. Can you provide a quick explanation for what needs to be done to enable find_best_config if training is stopped early?

I used the following command to train the folds: CUDA_VISIBLE_DEVICES=0 nnUNetv2_train 999 3d_fullres 0 --npz -devices cuda

The validation folders within each of the folds was not automatically generated and I'm not seeing any npz files saved even though I passed the npz arg. According to the documentation, it appears these aren't generated until after the "final validation". Maybe I need to resume training on each fold and refer to a new trainer file (not sure if I can change trainers mid-stream? Something like this if I stopped at 299 epochs: CUDA_VISIBLE_DEVICES=0 nnUNetv2_train 999 3d_fullres 0 --npz -devices cuda -c -tr nnUNetTrainer_300epochs.py

ykirchhoff commented 2 months ago

Hi @bkonk,

you can run your training command with the options --val and --val_best. --val will skip the training and directly go to the validation and --val_best ensures that checkpoint_best is used for the validation instead of checkpoint_final. This should work and you should be able to then use find_best_config. However, keep in mind that this is not the recommended use of nnUNet!

Best, Yannick

ykirchhoff commented 2 months ago

Hey,

I assume that this worked for you, otherwise feel free to reopen this issue or create another issue if there is anything else.

Best, Yannick