MIC-DKFZ / nnUNet

Apache License 2.0
5.96k stars 1.77k forks source link

RuntimeError: Some background workers are no longer alive #2412

Open antoniaaaaaaaa opened 3 months ago

antoniaaaaaaaa commented 3 months ago

when I run "nnUNetv2_train 219 3d_fullres all /nnUNetv2_train 219 3d_fullres all --c ", this issue happended.

This is the configuration used by this training: Configuration name: 3d_fullres {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 4, 'patch_size': [64, 160, 160], 'median_image_size_in_voxels': [228.0, 513.0, 513.0], 'spacing': [2.0, 0.7801485061645508, 0.7801485061645508], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [4, 5, 5], 'pool_op_kernel_sizes': [[1, 1, 1], [1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[1, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True}

These are the global plan.json settings: 0-7月-24XPS-8960" 18:13 30-7月-24 {'dataset_name': 'Dataset219_Amos2022_task2', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [5.0, 0.7801485061645508, 0.7801485061645508], 'original_median_shape_after_transp': [100, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 3284530.75, 'mean': 6204.70361328125, 'median': 65.0, 'min': -2048.0, 'percentile_00_5': -980.0, 'percentile_99_5': 68545.8046875, 'std': 90272.9140625}}}

2024-07-31 08:53:06.376566: unpacking dataset... 2024-07-31 08:53:07.813382: unpacking done... 2024-07-31 08:53:07.813927: do_dummy_2d_data_aug: False /home/huyao/anaconda3/envs/nnUNet/lib/python3.9/site-packages/torch/onnx/symbolic_helper.py:1515: UserWarning: ONNX export mode is set to TrainingMode.EVAL, but operator 'instance_norm' is set to train=True. Exporting with train=True. warnings.warn( 2024-07-31 08:53:09.415558: Unable to plot network architecture: 2024-07-31 08:53:09.415628: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH 2024-07-31 08:53:09.688189: Training done. 2024-07-31 08:53:09.755372: predicting amos_0001 2024-07-31 08:53:09.759605: amos_0001, shape torch.Size([1, 225, 561, 561]), rank 0 2024-07-31 08:54:22.454152: predicting amos_0004 2024-07-31 08:54:22.469496: amos_0004, shape torch.Size([1, 195, 513, 513]), rank 0 2024-07-31 08:55:07.713235: predicting amos_0005 2024-07-31 08:55:07.726948: amos_0005, shape torch.Size([1, 200, 560, 560]), rank 0 Traceback (most recent call last): File "/home/huyao/anaconda3/envs/nnUNet/bin/nnUNetv2_train", line 8, in sys.exit(run_training_entry()) File "/home/huyao/nnUNet/nnunetv2/run/run_training.py", line 268, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/huyao/nnUNet/nnunetv2/run/run_training.py", line 208, in run_training nnunet_trainer.perform_actual_validation(export_validation_probabilities) File "/home/huyao/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1175, in perform_actual_validation proceed = not check_workers_alive_and_busy(segmentation_export_pool, worker_list, results, File "/home/huyao/nnUNet/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy raise RuntimeError('Some background workers are no longer alive') RuntimeError: Some background workers are no longer alive

antoniaaaaaaaa commented 3 months ago

I added os.environ['OMP_NUM_THREADS']="1" in init.py, but the problem still has.