MIC-DKFZ / nnUNet

Apache License 2.0
5.95k stars 1.77k forks source link

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message #2514

Open TraffickerCH opened 2 months ago

TraffickerCH commented 2 months ago

I encountered some problems during training. (brats) (base) user1@5e374944978b:~/BraTS$ OMP_NUM_THREADS=1 nnUNetv2_train 1 3d_fullres 0 --npz

############################ INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md ############################

Using device: cuda:0 /home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:164: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None

####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

2024-09-21 02:47:26.391587: do_dummy_2d_data_aug: False 2024-09-21 02:47:26.395703: Using splits from existing split file: /home/user1/BraTS/graphs/models/nnUNet/nnUNet_preprocessed/Dataset001_BraTS/splits_final.json 2024-09-21 02:47:26.396373: The split file contains 5 splits. 2024-09-21 02:47:26.396436: Desired fold for training: 0 2024-09-21 02:47:26.396486: This split has 433 training and 109 validation cases. using pin_memory on device 0 Exception in thread Thread-1: *Traceback (most recent call last): File "/home/user1/miniconda3/envs/macau/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/user1/miniconda3/envs/macau/lib/python3.9/threading.py", line 917, in run self._target(self._args, self._kwargs) File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Traceback (most recent call last): File "/home/user1/miniconda3/envs/macau/bin/nnUNetv2_train", line 9, in sys.exit(run_training_entry()) File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/run/run_training.py", line 211, in run_training nnunet_trainer.run_training() File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1362, in run_training self.on_train_start() File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 903, in on_train_start self.dataloader_train, self.dataloader_val = self.get_dataloaders() File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 696, in getdataloaders = next(mt_gen_train) File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in next item = self.__get_next_item() File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I tried using "OMP_NUM_THREADS=1" to handle this, but it didn't work at all. Can anyone give me some advice?PLEASE!