Meet with :RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I meet with the problem called: 2024-05-29 01:06:15.304361: do_dummy_2d_data_aug: False 2024-05-29 01:06:15.304548: Using splits from existing split file: nnUNet_preprocessed/Dataset059_10%HRF/splits_final.json 2024-05-29 01:06:15.304609: The split file contains 5 splits. 2024-05-29 01:06:15.304630: Desired fold for training: 1 2024-05-29 01:06:15.304647: This split has 4 training and 1 validation cases. using pin_memory on device 0 Exception in thread Thread-1: Traceback (most recent call last): File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 110, in results_loop item = pin_memory_of_all_eligible_items_in_dict(item) File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 80, in pin_memory_of_all_eligible_items_in_dict result_dict[k] = result_dict[k].pin_memory() RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last): File "/home/xly/anaconda3/envs/Sammed/bin/nnUNetv2_train", line 8, in sys.exit(run_training_entry()) File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/nnunetv2/run/run_training.py", line 274, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/nnunetv2/run/run_training.py", line 210, in run_training nnunet_trainer.run_training() File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1287, in run_training self.on_train_start() File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 831, in on_train_start self.dataloader_train, self.dataloader_val = self.get_dataloaders() File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 655, in getdataloaders = next(mt_gen_train) File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in next item = self.__get_next_item() File "/home/xly/anaconda3/envs/Sammed/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I tried to change a variable called self.num_epochs in nnUnetTrainer but i failed ,then i changed the parameter back. but met with this problem, the project could run before i changed the variable

MIC-DKFZ / nnUNet

Meet with :RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message #2238