MIC-DKFZ / nnUNet

Apache License 2.0
5.57k stars 1.7k forks source link

RuntimeError: Some background workers are no longer alive #2398

Closed Saul62 closed 2 weeks ago

Saul62 commented 1 month ago

Traceback (most recent call last): File "/root/miniconda3/envs/umamba/bin/nnUNetv2_train", line 33, in sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')()) File "/root/autodl-tmp/nnunetv2/run/run_training.py", line 268, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/root/autodl-tmp/nnunetv2/run/run_training.py", line 208, in run_training nnunet_trainer.perform_actual_validation(export_validation_probabilities) File "/root/autodl-tmp/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1154, in perform_actual_validation proceed = not check_workers_alive_and_busy(segmentation_export_pool, worker_list, results, File "/root/autodl-tmp/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy raise RuntimeError('Some background workers are no longer alive') RuntimeError: Some background workers are no longer alive

pqu2 commented 1 month ago

I also got this issue.

but at the beginning, I got some other error info: 2024-07-26 21:14:56.131274: Epoch 0 2024-07-26 21:14:56.131622: Current learning rate: 0.01 In file included from /usr/include/python3.10/Python.h:8, from /tmp/tmp86kg66hp/main.c:5: /usr/include/python3.10/pyconfig.h:3:12: fatal error: x86_64-linux-gnu/python3.10/pyconfig.h: No such file or directory 3 | # include <x86_64-linux-gnu/python3.10/pyconfig.h> | ^~~~~~~~~~~~ compilation terminated. In file included from /usr/include/python3.10/Python.h:8, from /tmp/tmp7ihyzp2o/main.c:5: /usr/include/python3.10/pyconfig.h:3:12: fatal error: x86_64-linux-gnu/python3.10/pyconfig.h: No such file or directory 3 | # include <x86_64-linux-gnu/python3.10/pyconfig.h> | ^~~~~~~~~~~~ compilation terminated.

pqu2 commented 1 month ago

It seems only occurs in 2D image

Saul62 commented 1 month ago

yes,我训练2d的时候出现的这个问题,如何解决呢?

constantinulrich commented 1 month ago

torch.compile is sometimes prone to errors. Do you get the error if you are not using it?

pqu2 commented 1 month ago

torch.compile is sometimes prone to errors. Do you get the error if you are not using it?

How to stop using it?

pqu2 commented 1 month ago

yes,我训练2d的时候出现的这个问题,如何解决呢?

解决不了,好多人在问这个,原因千奇百怪。

diangu001 commented 1 month ago

yes,我训练2d的时候出现的这个问题,如何解决呢?

解决不了,好多人在问这个,原因千奇百怪。

我在3D训练过程中没有问题,但是在推理阶段,推理了到第十个数据的时候,就开始报错。进程被杀死