MrGiovanni / UNetPlusPlus

[IEEE TMI] Official Implementation for UNet++
Other
2.26k stars 538 forks source link

how to train on my designated GPU? #68

Closed ChibisukeDragon closed 3 years ago

ChibisukeDragon commented 3 years ago

I want to train this model on GPU 4... I used this command:

for FOLD in 0 1 2 3 4 do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

But it always tried to allocate space on GPU 0.

heyupeng_2020@irip-114:~$ for FOLD in 0 1 2 3 4

do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

Please cite the following paper when using nnUNet: Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020). If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

############################################### I am running the following nnUNet: 3d_fullres My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'> For that I will be using the following configuration: num_classes: 2 modalities: {0: 'CT'} use_mask_for_norm OrderedDict([(0, False)]) keep_only_largest_region None min_region_size_per_class None min_size_per_class None normalization_schemes OrderedDict([(0, 'CT')]) stages...

stage: 0 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1 ############################################### loading dataset loading all case properties unpacking dataset done weight_decay: 3e-05 2021-08-20 17:30:57.956403: lr: 0.01 using pin_memory on device 0 using pin_memory on device 0 2021-08-20 17:32:23.117584: Unable to plot network architecture: 2021-08-20 17:32:31.011488: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.80 GiB already allocated; 401.00 MiB free; 9.75 GiB reserved in total by PyTorch) 2021-08-20 17:32:31.011869: printing the network instead:

2021-08-20 17:32:31.012049: Generic_UNetPlusPlus( (loc0): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(640, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(768, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (3): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (4): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(192, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc1): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(512, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(384, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (3): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(160, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc2): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(192, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc3): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(96, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc4): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (conv_blocks_context): ModuleList( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (2): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (3): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (4): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (5): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (td): ModuleList() (up0): ModuleList( (0): ConvTranspose3d(320, 320, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (3): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (4): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up1): ModuleList( (0): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (3): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up2): ModuleList( (0): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up3): ModuleList( (0): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up4): ModuleList( (0): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (seg_outputs): ModuleList( (0): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (2): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (3): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (4): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) ) ) 2021-08-20 17:32:31.022252:

2021-08-20 17:32:31.371178: epoch: 0 Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')()) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main trainer.run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training ret = super().run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training super(nnUNetTrainer, self).run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 491, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 240, in run_iteration output = self.network(data) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/network_architecture/generic_UNetPlusPlus.py", line 417, in forward x0_4 = self.loc1[3](torch.cat([x0_0, x0_1, x0_2, x0_3, self.up13], 1)) RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.69 GiB already allocated; 487.00 MiB free; 9.66 GiB reserved in total by PyTorch) Exception in thread Thread-4: Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look " RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full Exception in thread Thread-5: Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run self._target(self._args, self._kwargs) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look " RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full

Please cite the following paper when using nnUNet: Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020). If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

############################################### I am running the following nnUNet: 3d_fullres My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'> For that I will be using the following configuration: num_classes: 2 modalities: {0: 'CT'} use_mask_for_norm OrderedDict([(0, False)]) keep_only_largest_region None min_region_size_per_class None min_size_per_class None normalization_schemes OrderedDict([(0, 'CT')]) stages...

stage: 0 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1 ############################################### loading dataset loading all case properties unpacking dataset done weight_decay: 3e-05 2021-08-20 17:33:23.698772: lr: 0.01 ^CTraceback (most recent call last): File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')()) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main trainer.run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training ret = super().run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training super(nnUNetTrainer, self).run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 453, in runtraining = self.tr_gen.next() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 190, in next return self.next() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 211, in next self._start() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 246, in _start with threadpool_limits(limits=1, user_api="blas"): File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 171, in init self._original_info = self._set_threadpool_limits() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits modules = _ThreadpoolInfo(prefixes=self._prefixes, File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 340, in init self._load_modules() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 375, in _load_modules self._find_modules_with_dl_iterate_phdr() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 387, in _find_modules_with_dl_iterate_phdr libc = self._get_libc() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 553, in _get_libc libc_name = find_library("c") File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 350, in find_library _findSoname_ldconfig(name) or \ File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 290, in _findSoname_ldconfig with subprocess.Popen(['/sbin/ldconfig', '-p'], File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 1662, in _execute_child part = os.read(errpipe_read, 50000) KeyboardInterrupt ^C heyupeng_2020@irip-114:~$


heyupeng_2020@irip-114:~$ nvidia-smi Fri Aug 20 17:37:16 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A | | 38% 59C P2 76W / 250W | 11085MiB / 11178MiB | 1% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A | | 30% 65C P2 84W / 250W | 6691MiB / 11178MiB | 1% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:07:00.0 Off | N/A | | 33% 67C P2 86W / 250W | 6691MiB / 11178MiB | 1% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A | | 34% 55C P2 76W / 250W | 6691MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 108... Off | 00000000:0C:00.0 Off | N/A | | 18% 37C P0 60W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 108... Off | 00000000:0D:00.0 Off | N/A | | 14% 42C P0 63W / 250W | 0MiB / 11178MiB | 2% Default | +-------------------------------+----------------------+----------------------+ | 6 GeForce GTX 108... Off | 00000000:0E:00.0 Off | N/A | | 51% 79C P2 95W / 250W | 6809MiB / 11178MiB | 3% Default | +-------------------------------+----------------------+----------------------+ | 7 GeForce GTX 108... Off | 00000000:0F:00.0 Off | N/A | | 54% 84C P2 267W / 250W | 9581MiB / 11178MiB | 100% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 8106 C python 11075MiB | | 1 8106 C python 6681MiB | | 2 8106 C python 6681MiB | | 3 8106 C python 6681MiB | | 6 30886 C python 6115MiB | | 7 20708 C python 9571MiB | +-----------------------------------------------------------------------------+ heyupeng_2020@irip-114:~$

ChibisukeDragon commented 3 years ago

I think the “GPU0” in the log is my GPU 4... So I should not use 1080ti to train this model...