MIC-DKFZ / nnUNet

Apache License 2.0
5.91k stars 1.76k forks source link

Unpacking dataset... #2481

Closed clarkbab closed 2 months ago

clarkbab commented 2 months ago

Hi there,

I'm running nnUNet for the first time with a CT dataset and the new ResEnc preset overridden for 80GB training on A100. Here's the generated configuration:

Configuration name: 3d_fullres
 {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [192, 320, 256], 'median_image_size_in_voxels': [206.0, 380.0, 330.0], 'spacing': [2.0, 1.0, 1.0], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_class_name': 'dynamic_network_architectures.architectures.unet.ResidualEncoderUNet', 'arch_kwargs': {'n_stages': 7, 'features_per_stage': [32, 64, 128, 256, 320, 320, 320], 'conv_op': 'torch.nn.modules.conv.Conv3d', 'kernel_sizes': [[1, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'strides': [[1, 1, 1], [1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'n_blocks_per_stage': [1, 3, 4, 6, 6, 6, 6], 'n_conv_per_stage_decoder': [1, 1, 1, 1, 1, 1], 'conv_bias': True, 'norm_op': 'torch.nn.modules.instancenorm.InstanceNorm3d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': {'inplace': True}, 'deep_supervision': True}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': False}

These are the global plan.json settings:
 {'dataset_name': 'Dataset011_REF_MODEL_FOLD_0', 'plans_name': 'nnUNetResEncUNetPlansXXL', 'original_median_spacing_after_transp': [2.0, 1.0, 1.0], 'original_median_shape_after_transp': [206, 380, 330], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'nnUNetPlannerResEncXL', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 30276.78125, 'mean': 86.20446014404297, 'median': 39.30843734741211, 'min': -1024.0, 'percentile_00_5': -947.2280883789062, 'percentile_99_5': 1264.784423828125, 'std': 261.6113586425781}}} 

It has been sitting on the unpacking dataset... step for about 24 hours which I find surprising.

$ pgrep -U baclark | xargs -I {} ls -l /proc/{}/fd | egrep -o ".(home|data).*" | sort | uniq
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/25-1.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/25-1_seg.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/29-0.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/29-0_seg.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/3-1.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/3-1_seg.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/36-1.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/36-1_seg.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/48-1.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/48-1_seg.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/49-1.npy
/data/gpfs/projects/punim1413/mymi/datasets/nnunet/preprocessed/Dataset011_REF_MODEL_FOLD_0/nnUNetPlans_3d_fullres/49-1_seg.npy
...

Any ideas what is happening here? Also, what does the unpacking step achieve, is this loading the dataset into RAM?

Thanks, Brett

clarkbab commented 2 months ago

An update on this one... it seems the model was actually training, but the output wasn't logging to STDOUT. In particular the line self.print_to_log_file('unpacking done...') wasn't printing which confused me. Was able to flush with sys.stdout.flush().