training error (nnunet 1.7.0)

2024-11-05 14:36:15.651493: epoch: 12 selected_keys: ['1.2.840.113619.2.205.114374075381108.7661.1501133543188.2055' 'USC.3DDSA.000142'] bbox: 0 307 0 268 0 366 valid: 72 200 233 361 172 268 Traceback (most recent call last): File "/opt/conda/envs/usal/bin/nnUNet_train", line 33, in sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')()) File "/home/data/LHT/nnUNet/nnunet/run/run_training.py", line 182, in main trainer.run_training() File "/home/data/LHT/nnUNet/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training ret = super().run_training() File "/home/data/LHT/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 318, in run_training super(nnUNetTrainer, self).run_training() File "/home/data/LHT/nnUNet/nnunet/training/network_training/network_trainer.py", line 456, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/data/LHT/nnUNet/nnunet/training/network_training/nnUNetTrainerV2.py", line 232, in run_iteration data_dict = next(data_generator) File "/opt/conda/envs/usal/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 206, in next item = self.get_next_item() File "/opt/conda/envs/usal/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 190, in get_ne xt_item raise RuntimeError("MultiThreadedAugmenter.abort_event was set, something went wrong. Maybe one of " RuntimeError: MultiThreadedAugmenter.abort_event was set, something went wrong. Maybe one of your workers crashed. This is not the actua l error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full Exception in thread Thread-5: Traceback (most recent call last): File "/opt/conda/envs/usal/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/opt/conda/envs/usal/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/usal/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threadedaugmenter.py", line 102, in results loop item = current_queue.get() File "/opt/conda/envs/usal/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/opt/conda/envs/usal/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 305, in rebuild_storage_fd fd = df.detach() File "/opt/conda/envs/usal/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/opt/conda/envs/usal/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection

Is it a data issue? But a check has been done during the preprocessing stage （nnUNet_plan_and_preprocess -t 1 --verify_dataset_integrity）

MIC-DKFZ / nnUNet

training error (nnunet 1.7.0) #2583