MIC-DKFZ / nnUNet

Apache License 2.0
5.91k stars 1.76k forks source link

Encountered: "corrupted size vs. prev_size" error #2599

Open Wayyuanyuan opened 5 days ago

Wayyuanyuan commented 5 days ago

When I am running this command, I get an error.

nnUNetv2_train 125 3d_fullres 4

The full log is:

2024-11-13 11:10:19.750743: do_dummy_2d_data_aug: True
2024-11-13 11:10:19.751406: Creating new 5-fold cross-validation split...
2024-11-13 11:10:19.752586: Desired fold for training: 4
2024-11-13 11:10:19.752651: This split has 32 training and 8 validation cases.
using pin_memory on device 0
using pin_memory on device 0
2024-11-13 11:10:27.887467: Using torch.compile...
/home/xxx/miniconda3/envs/nnunet/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn(

This is the configuration used by this training:
Configuration name: 3d_fullres
 {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [6, 512, 512], 'median_image_size_in_voxels': [11.0, 1024.0, 1024.0], 'spacing': [8.0, 0.1953125, 0.1953125], 'normalization_schemes': ['NoNormalization', 'NoNormalization', 'NoNormalization', 'NoNormalization'], 'use_mask_for_norm': [False, False, False, False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_class_name': 'dynamic_network_architectures.architectures.unet.PlainConvUNet', 'arch_kwargs': {'n_stages': 8, 'features_per_stage': [32, 64, 128, 256, 320, 320, 320, 320], 'conv_op': 'torch.nn.modules.conv.Conv3d', 'kernel_sizes': [[1, 3, 3], [1, 3, 3], [1, 3, 3], [1, 3, 3], [1, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'strides': [[1, 1, 1], [1, 2, 2], [1, 2, 2], [1, 2, 2], [1, 2, 2], [1, 2, 2], [1, 2, 2], [1, 2, 2]], 'n_conv_per_stage': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'conv_bias': True, 'norm_op': 'torch.nn.modules.instancenorm.InstanceNorm3d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': {'inplace': True}, 'deep_supervision': True}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': False}

These are the global plan.json settings:
 {'dataset_name': 'Dataset125_StandardScar', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [8.0, 0.1953125, 0.1953125], 'original_median_shape_after_transp': [11, 1024, 1024], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 1348.0, 'mean': 138.0654296875, 'median': 128.0, 'min': -392.0, 'percentile_00_5': -142.0, 'percentile_99_5': 583.0, 'std': 117.80979919433594}, '1': {'max': 870.0, 'mean': 108.35521697998047, 'median': 102.0, 'min': -202.0, 'percentile_00_5': -73.0, 'percentile_99_5': 397.0, 'std': 75.5691909790039}, '2': {'max': 209.0, 'mean': 57.79533767700195, 'median': 59.0, 'min': -140.0, 'percentile_00_5': -26.0, 'percentile_99_5': 124.0, 'std': 23.255491256713867}, '3': {'max': 437.0, 'mean': 75.72803497314453, 'median': 74.0, 'min': -118.0, 'percentile_00_5': -29.0, 'percentile_99_5': 220.0, 'std': 35.11865234375}}}

2024-11-13 11:10:28.660186: unpacking dataset...
2024-11-13 11:10:38.950119: unpacking done...
2024-11-13 11:10:38.953328: Unable to plot network architecture: nnUNet_compile is enabled!
2024-11-13 11:10:39.002148:
2024-11-13 11:10:39.002385: Epoch 0
2024-11-13 11:10:39.002838: Current learning rate: 0.01
corrupted size vs. prev_size
corrupted size vs. prev_size
Exception in thread Thread-1 (results_loop):
Traceback (most recent call last):
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/threading.py", line 1052, in _bootstrap_inner
    self.run()
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/threading.py", line 989, in run
    self._target(*self._args, **self._kwargs)
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
  File "/home/XXX/miniconda3/envs/nnunet/bin/nnUNetv2_train", line 8, in <module>
    sys.exit(run_training_entry())
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 275, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 211, in run_training
    nnunet_trainer.run_training()
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1370, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__
    item = self.__get_next_item()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-2 (results_loop):
Traceback (most recent call last):
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/threading.py", line 1052, in _bootstrap_inner
    self.run()
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/threading.py", line 989, in run
    self._target(*self._args, **self._kwargs)
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "/home/XXX/miniconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I don't know why it keeps prompting “RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message”, I've tried looking up the prompts in the logs and it seems to be the only one:

corrupted size vs. prev_size
corrupted size vs. prev_size

Possibly related, but I can't find a solution for this from the internet.

As a matter of fact, I've already executed this order:
 nnUNetv2_plan_and_preprocess -d 124 --verify_dataset_integrity

Going to “experiment planning and preprocessing”, this is part of the log of the execution of the “experiment planning and preprocessing” command, I don't know if it's related to the problems encountered during this execution of the training.

$ nnUNetv2_plan_and_preprocess -d 124 --verify_dataset_integrity

Fingerprint extraction...
Dataset124_resampledScar
Using <class 'nnunetv2.imageio.simpleitk_reader_writer.SimpleITKIO'> as reader/writer
WARNING! Not all input images have the same origin!
Origins:
[(-4.470895767211914, -303.8560791015625, -432.8706359863281), (-6.175111770629883, -303.84234619140625, -432.80615234375), (-4.4064788818359375, -303.6878967285156, -432.8159484863281), (-4.558997631072998, -303.8019104003906, -432.84918212890625)]
Image files:
['/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0000.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0001.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0002.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0003.nii.gz']
It is up to you to decide whether that's a problem. You should run nnUNetv2_plot_overlay_pngs to verify that segmentations and data overlap.
WARNING! Not all input images have the same direction!
Directions:
[(0.7249377510466453, -0.033225640262165304, -0.6880125829552542, 0.6597619691757571, 0.32051894188074836, 0.6796923707965171, 0.1979378719112628, -0.9466592125600505, 0.25427734223923526), (0.7347161775984044, -0.03322565455148758, -0.6775604557270991, 0.649960700722706, 0.3205189417284067, 0.6890708876138939, 0.19427615883385993, -0.9466592121101064, 0.2570858624621083), (0.7235732790899242, -0.033225640262165304, -0.6894474592383942, 0.6611072319789659, 0.32051894188074836, 0.6783839431516916, 0.1984412699318479, -0.9466592125600505, 0.25388467189589264), (0.7251306920448843, -0.033225640262165304, -0.6878092259202788, 0.6595713047911503, 0.32051894188074836, 0.6798773974557673, 0.19786655440120882, -0.9466592125600505, 0.25433283934191325)]
Image files:
['/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0000.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0001.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0002.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_11_0003.nii.gz']
It is up to you to decide whether that's a problem. You should run nnUNetv2_plot_overlay_pngs to verify that segmentations and data overlap.
WARNING! Not all input images have the same origin!
Origins:
[(39.85212326049805, -303.416748046875, -467.0427551269531), (39.64960861206055, -303.2052917480469, -466.9926452636719), (39.68174743652344, -303.1330871582031, -466.23077392578125), (39.66739273071289, -303.0367431640625, -465.6351318359375)]
Image files:
['/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_19_0000.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_19_0001.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_19_0002.nii.gz', '/home/XXX/Code/Python/nnUNet_raw_data/Dataset124_resampledScar/imagesTr/patient_19_0003.nii.gz']
It is up to you to decide whether that's a problem. You should run nnUNetv2_plot_overlay_pngs to verify that segmentations and data overlap.
WARNING! Not all input images have the same direction!

I hope that the contents of these logs of mine will help you to better optimize your programs and that you will be able to solve my problems.