Closed maggie2209 closed 6 months ago
Hey, this is an issue you should raise on the pytorch GitHub.
Until this is fixed on their end (or there is an official workaround) you can circumvent this problem by doing one of these:
export nnUNet_compile=F
(or nnUNet_compile=F nnUNetv2_train [...]
) to disable compilation. This will reduce training speed by 15-30%.Best, Fabian
See also here: https://github.com/pytorch/pytorch/issues/120233
Thank you!
Hello,
I have used nnU-Net v1 many times before, but just switched to v2 as I want to use pretraining for some of my datasets. I followed the instructions given on here for the preparation for pretraining. The error this issue is referring to happens when I run the actual training on the source data set using the following command:
nnUNetv2_train 731 2d all -p newPlans
This is the training log featuring the error message:
`Using device: cuda:0
####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################
2024-04-30 15:49:13.318915: do_dummy_2d_data_aug: False using pin_memory on device 0 using pin_memory on device 0 2024-04-30 15:53:01.118280: Using torch.compile... Traceback (most recent call last): File "miniconda3/envs/nnunetv2/bin/nnUNetv2_train", line 8, in
sys.exit(run_training_entry())
^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 274, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 210, in run_training
nnunet_trainer.run_training()
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1287, in run_training
self.on_train_start()
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 834, in on_train_start
self.initialize()
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 218, in initialize
self.network = torch.compile(self.network)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/torch/init.py", line 1866, in compile
raise RuntimeError("Dynamo is not supported on Python 3.12+")
RuntimeError: Dynamo is not supported on Python 3.12+
Exception in thread Thread-1 (results_loop):
Traceback (most recent call last):
File "miniconda3/envs/nnunetv2/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "miniconda3/envs/nnunetv2/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, *self._kwargs)
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-2 (results_loop):
Traceback (most recent call last):
File "miniconda3/envs/nnunetv2/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "miniconda3/envs/nnunetv2/lib/python3.12/threading.py", line 1010, in run
self._target(self._args, **self._kwargs)
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "miniconda3/envs/nnunetv2/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
`
So it seems like there is a problem with Dynamo and Python. Unfortunately, I am not very familiar with PyTorch. Could anyone please advise on how to resolve this? Do I need to change my Python version?
Many thanks in advance!