MIC-DKFZ / nnUNet

Apache License 2.0
5.68k stars 1.72k forks source link

Can not change the number of epochs #1925

Closed N0oNam3 closed 8 months ago

N0oNam3 commented 8 months ago

Hi, I already read issue #322 and #587 . In september everything worked: I changed my self.num_epochs = 150 and added class nnUNetTrainer_150epochs(nnUNetTrainer): def __init__(self, plans: dict, configuration: str, fold: int, dataset_json: dict, unpack_dataset: bool = True, device: torch.device = torch.device('cuda')): super().__init__(plans, configuration, fold, dataset_json, unpack_dataset, device) self.num_epochs = 150 But now, after not using it for a few months, it continues to train for over 600 epochs (manually stopped). I reinstalled the nnUnetv2 and I cleared the pycache but haven't found the source of the issue. I would be really grateful for help.

TaWald commented 8 months ago

Are you sure you are calling the correct nnunet version? Do you have the correct conda/virtualenv loaded when doing pip install -e . and the run command after?

You should probably verify you installed nnunet correctly, and provide more infos:

  1. Provide the environment name
  2. Provide the pip freeze highlighting the current env
  3. provide your training call / bash script that you use to call that trained

Also you should verify you are calling the right instance of your trainer class. Maybe you have multiple classes with the same name and you default to another class in the dir search that nnunet does?

Overall it is really difficult to help without details.

N0oNam3 commented 8 months ago

EDIT: and to try whether my changes work: I changed self.num_epochs to 1 (the class was already in the code).
__ I hope I got everything you need: I run my code with: CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 123 3d_fullres 0 --npz My dataset name is "Dataset123_G2G" .

I did: -uninstalled nnunetv2 (and just in case also nnunet which was not installed) -installed with pip install -e . -got warning "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. dg-tta 1.0.12 requires nnunetv2<3.0.0,>=2.2.1, but you have nnunetv2 2.1.1 which is incompatible."

TaWald commented 8 months ago

So as of right now it seems to me that you have a bunch of pre-existing stuff installed that blocks you from cleanly installing nnunetv2. As you can see your install with pip install -e . is not going through due to some depency issue of the pre-existing dg-ttapackage. It seems to me like you are not using some form of virtualenv (virtualenvironment) or conda env conda environment and have some pre-existing garbage installed that blocks a clean installation of nnunet.

Your subsequent call of pip install nnunetv2 does install nnunet from pip and does not contain your local modifications. Moreover you do not specify your trainer class in your train call . You should probably call nnUNetv2_train --help to checkout how to properly specify which trainer to use. (You need to specify it with <your previous call> -tr nnUNetTrainer_150epochs.

Personally I would advise you to clean up your python environments by installing conda and then creating environments you use so stuff like this does not happen anymore. https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-python

N0oNam3 commented 8 months ago

Thank you, it works now :D