Can not change the number of epochs

N0oNam3 commented 8 months ago

Hi, I already read issue #322 and #587 . In september everything worked: I changed my self.num_epochs = 150 and added class nnUNetTrainer_150epochs(nnUNetTrainer): def __init__(self, plans: dict, configuration: str, fold: int, dataset_json: dict, unpack_dataset: bool = True, device: torch.device = torch.device('cuda')): super().__init__(plans, configuration, fold, dataset_json, unpack_dataset, device) self.num_epochs = 150 But now, after not using it for a few months, it continues to train for over 600 epochs (manually stopped). I reinstalled the nnUnetv2 and I cleared the pycache but haven't found the source of the issue. I would be really grateful for help.

TaWald commented 8 months ago

Are you sure you are calling the correct nnunet version? Do you have the correct conda/virtualenv loaded when doing pip install -e . and the run command after?

You should probably verify you installed nnunet correctly, and provide more infos:

Provide the environment name
Provide the pip freeze highlighting the current env
provide your training call / bash script that you use to call that trained

Also you should verify you are calling the right instance of your trainer class. Maybe you have multiple classes with the same name and you default to another class in the dir search that nnunet does?

Overall it is really difficult to help without details.

N0oNam3 commented 8 months ago

EDIT: and to try whether my changes work: I changed self.num_epochs to 1 (the class was already in the code).
__ I hope I got everything you need: I run my code with: CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 123 3d_fullres 0 --npz My dataset name is "Dataset123_G2G" .

I did: -uninstalled nnunetv2 (and just in case also nnunet which was not installed) -installed with pip install -e . -got warning "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. dg-tta 1.0.12 requires nnunetv2<3.0.0,>=2.2.1, but you have nnunetv2 2.1.1 which is incompatible."

installed again: pip install "nnunetv2>=2.2.1,<3.0.0"

output on pip freeze:

attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1683424013410/work
batchgenerators==0.25
blinker @ file:///home/conda/feedstock_root/build_artifacts/blinker_1681349778161/work
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854167867/work
cachetools @ file:///home/conda/feedstock_root/build_artifacts/cachetools_1688227447901/work
certifi==2023.7.22
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1636046050867/work
chardet==5.2.0
charset-normalizer==3.3.2
click @ file:///home/conda/feedstock_root/build_artifacts/click_1692311806742/work
cmake==3.26.3
comm==0.2.1
connected-components-3d==3.11.0
contourpy==1.0.7
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1637687018854/work
cycler==0.11.0
debugpy==1.8.0
decorator==5.1.1
dg-tta==1.0.12
dicom2nifti==2.4.8
docker-pycreds==0.4.0
dynamic-network-architectures==0.2
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.12.0
fire==0.5.0
fonttools==4.39.4
frozenlist @ file:///croot/frozenlist_1670004507010/work
future==0.18.3
gitdb==4.0.11
GitPython==3.1.41
google-auth @ file:///home/conda/feedstock_root/build_artifacts/google-auth_1694495354132/work
google-auth-oauthlib @ file:///home/conda/feedstock_root/build_artifacts/google-auth-oauthlib_1688235217226/work
graphviz==0.20.1
grpcio @ file:///croot/grpc-suite_1681912592597/work
hiddenlayer @ git+https://github.com/FabianIsensee/hiddenlayer.git@4b98f9e5cccebac67368f02b95f4700b522345b1
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
imagecodecs==2023.3.16
imageio==2.31.0
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1688754491823/work
ipykernel==6.29.0
ipython==8.20.0
jedi==0.19.1
Jinja2==3.1.2
joblib==1.2.0
jupyter_client==8.6.0
jupyter_core==5.7.1
kiwisolver==1.4.4
lazy_loader==0.2
linecache2==1.0.0
lit==16.0.5.post0
Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1690307387991/work
MarkupSafe==2.1.3
matplotlib==3.7.1
matplotlib-inline==0.1.6
MedPy==0.4.0
mpmath==1.3.0
multidict @ file:///croot/multidict_1665674239670/work
nest-asyncio==1.6.0
networkx==3.1
nibabel==5.1.0
nnunetv2==2.2.1
numpy==1.24.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib @ file:///home/conda/feedstock_root/build_artifacts/oauthlib_1666056362788/work
packaging==23.1
pandas==2.0.2
parso==0.8.3
pexpect==4.9.0
Pillow==9.5.0
platformdirs==4.1.0
prompt-toolkit==3.0.43
protobuf==3.20.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.7
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pydicom==2.3.1
Pygments==2.17.2
PyJWT @ file:///home/conda/feedstock_root/build_artifacts/pyjwt_1689721553971/work
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1608055815057/work
pyparsing==3.0.9
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.8.2
python-gdcm==3.0.22
pytz==2023.3
pyu2f @ file:///home/conda/feedstock_root/build_artifacts/pyu2f_1604248910016/work
PyWavelets==1.4.1
PyYAML==6.0
pyzmq==25.1.2
randomname==0.2.1
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work
requests-oauthlib @ file:///home/conda/feedstock_root/build_artifacts/requests-oauthlib_1643557462909/work
rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1658328885051/work
scikit-image==0.21.0
scikit-learn==1.2.2
scipy==1.10.1
seaborn==0.12.2
sentry-sdk==1.39.2
setproctitle==1.3.3
SimpleITK==2.2.1
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smmap==5.0.1
stack-data==0.6.3
sympy==1.12
tensorboard @ file:///home/conda/feedstock_root/build_artifacts/tensorboard_1691595541663/work/tensorboard-2.14.0-py3-none-any.whl#sha256=3667f9745d99280836ad673022362c840f60ed8fefd5a3e30bf071f5a8fd0017
tensorboard-data-server @ file:///croot/tensorboard-data-server_1681498183723/work/tensorboard_data_server-0.7.0-py3-none-manylinux2014_x86_64.whl
termcolor==2.4.0
threadpoolctl==3.1.0
tifffile==2023.4.12
torch==2.0.1
tornado==6.4
tqdm==4.65.0
traceback2==1.4.0
traitlets==5.14.1
triton==2.0.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1688315532570/work
tzdata==2023.3
unittest2==1.1.0
urllib3==2.0.2
wandb==0.16.2
wcwidth==0.2.13
Werkzeug @ file:///home/conda/feedstock_root/build_artifacts/werkzeug_1651670883478/work
yacs==0.1.8
yarl @ file:///home/conda/feedstock_root/build_artifacts/yarl_1648966516552/work
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1689374466814/work

TaWald commented 8 months ago

So as of right now it seems to me that you have a bunch of pre-existing stuff installed that blocks you from cleanly installing nnunetv2. As you can see your install with pip install -e . is not going through due to some depency issue of the pre-existing dg-ttapackage. It seems to me like you are not using some form of virtualenv (virtualenvironment) or conda env conda environment and have some pre-existing garbage installed that blocks a clean installation of nnunet.

Your subsequent call of pip install nnunetv2 does install nnunet from pip and does not contain your local modifications. Moreover you do not specify your trainer class in your train call . You should probably call nnUNetv2_train --help to checkout how to properly specify which trainer to use. (You need to specify it with <your previous call> -tr nnUNetTrainer_150epochs.

Personally I would advise you to clean up your python environments by installing conda and then creating environments you use so stuff like this does not happen anymore. https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-python

N0oNam3 commented 8 months ago

Thank you, it works now :D

MIC-DKFZ / nnUNet

Can not change the number of epochs #1925