MIC-DKFZ / nnUNet

Apache License 2.0
5.79k stars 1.74k forks source link

RuntimeError #1879

Closed WANGDUNDUN2 closed 9 months ago

WANGDUNDUN2 commented 9 months ago

Hello,

I am doing the training, but the code hangs displaying this error message:####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

This is the configuration used by this training: Configuration name: 3d_fullres {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [192, 112, 112], 'median_image_size_in_voxels': [403.0, 249.0, 259.0], 'spacing': [1.5, 1.5, 1.5], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [5, 4, 4], 'pool_op_kernel_sizes': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 1, 1]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True}

These are the global plan.json settings: {'dataset_name': 'Dataset666_Humerus', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.5, 1.5, 1.5], 'original_median_shape_after_transp': [403, 249, 259], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 4255.0, 'mean': 227.2852937742962, 'median': 148.0, 'min': -9060.0, 'percentile_00_5': -201.0, 'percentile_99_5': 1647.0, 'std': 329.54451477036383}}}

2024-01-07 22:01:30.358813: unpacking dataset... 2024-01-07 22:01:32.971546: unpacking done... 2024-01-07 22:01:32.972132: do_dummy_2d_data_aug: False 2024-01-07 22:01:32.974254: Using splits from existing split file: /home/wangchunhui/nnUNet-master/nnUNetFrame/DATASET/nnUNet_preprocessed/Dataset666_Humerus/splits_final.json 2024-01-07 22:01:32.974600: The split file contains 5 splits. 2024-01-07 22:01:32.974645: Desired fold for training: 0 2024-01-07 22:01:32.974678: This split has 304 training and 77 validation cases. /home/wangchunhui/.local/lib/python3.10/site-packages/torch/onnx/symbolic_helper.py:1513: UserWarning: ONNX export mode is set to TrainingMode.EVAL, but operator 'instance_norm' is set to train=True. Exporting with train=True. warnings.warn( 2024-01-07 22:01:38.349205: Unable to plot network architecture: 2024-01-07 22:01:38.349380: 'torch._C.Node' object is not subscriptable 2024-01-07 22:01:38.384679: 2024-01-07 22:01:38.384748: Epoch 0 2024-01-07 22:01:38.384867: Current learning rate: 0.01 Exception in background worker 0: No data left in file Traceback (most recent call last): File "/home/wangchunhui/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/wangchunhui/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/home/wangchunhui/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_3d.py", line 19, in generate_train_batch data, seg, properties = self._data.load_case(i) File "/home/wangchunhui/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/wangchunhui/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 436, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/wangchunhui/.local/bin/nnUNetv2_train", line 8, in sys.exit(run_training_entry()) File "/home/wangchunhui/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wangchunhui/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/wangchunhui/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/wangchunhui/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in next item = self.__get_next_item() File "/home/wangchunhui/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

Then, I tried:OMP_NUM_THREADS=1 nnUNetv2_train 137 3d_fullres 0 --npz, and still got the same error

Karol-G commented 9 months ago

Hey @WANGDUNDUN2,

The relevant part of the stack trace is this one: EOFError: No data left in file It seems there are some corrupted files. Please delete the used dataset from your nnUNet_preprocessed folder und rerun nnUNetv2_plan_and_preprocess. This should fix it.

Best, Karol