how to train on my designated GPU?

I want to train this model on GPU 4... I used this command:

for FOLD in 0 1 2 3 4 do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

But it always tried to allocate space on GPU 0.

heyupeng_2020@irip-114:~$ for FOLD in 0 1 2 3 4

do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

Please cite the following paper when using nnUNet: Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020). If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

############################################### I am running the following nnUNet: 3d_fullres My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'> For that I will be using the following configuration: num_classes: 2 modalities: {0: 'CT'} use_mask_for_norm OrderedDict([(0, False)]) keep_only_largest_region None min_region_size_per_class None min_size_per_class None normalization_schemes OrderedDict([(0, 'CT')]) stages...

stage: 0 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1 {'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans I am using batch dice + CE loss

2021-08-20 17:32:31.012049: Generic_UNetPlusPlus( (loc0): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(640, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(768, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (3): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (4): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(192, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc1): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(512, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(384, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (3): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(160, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc2): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(192, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (2): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc3): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) (1): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(96, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (loc4): ModuleList( (0): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (conv_blocks_context): ModuleList( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (2): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (3): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (4): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(256, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) (1): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (5): Sequential( (0): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) (1): StackedConvLayers( (blocks): Sequential( (0): ConvDropoutNormNonlin( (conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1)) (instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (lrelu): LeakyReLU(negative_slope=0.01, inplace=True) ) ) ) ) ) (td): ModuleList() (up0): ModuleList( (0): ConvTranspose3d(320, 320, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (3): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (4): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up1): ModuleList( (0): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (3): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up2): ModuleList( (0): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (2): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up3): ModuleList( (0): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) (1): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (up4): ModuleList( (0): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False) ) (seg_outputs): ModuleList( (0): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (2): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (3): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (4): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) ) ) 2021-08-20 17:32:31.022252:

2021-08-20 17:32:31.371178: epoch: 0 Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')()) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main trainer.run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training ret = super().run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training super(nnUNetTrainer, self).run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 491, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 240, in run_iteration output = self.network(data) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/network_architecture/generic_UNetPlusPlus.py", line 417, in forward x0_4 = self.loc1[3](torch.cat([x0_0, x0_1, x0_2, x0_3, self.up13], 1)) RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.69 GiB already allocated; 487.00 MiB free; 9.66 GiB reserved in total by PyTorch) Exception in thread Thread-4: Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look " RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full Exception in thread Thread-5: Traceback (most recent call last): File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run self._target(self._args, self._kwargs) File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look " RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full

I am using stage 1 from these plans I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1 ############################################### loading dataset loading all case properties unpacking dataset done weight_decay: 3e-05 2021-08-20 17:33:23.698772: lr: 0.01 ^CTraceback (most recent call last): File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')()) File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main trainer.run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training ret = super().run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training super(nnUNetTrainer, self).run_training() File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 453, in runtraining = self.tr_gen.next() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 190, in next return self.next() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 211, in next self._start() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 246, in _start with threadpool_limits(limits=1, user_api="blas"): File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 171, in init self._original_info = self._set_threadpool_limits() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits modules = _ThreadpoolInfo(prefixes=self._prefixes, File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 340, in init self._load_modules() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 375, in _load_modules self._find_modules_with_dl_iterate_phdr() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 387, in _find_modules_with_dl_iterate_phdr libc = self._get_libc() File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 553, in _get_libc libc_name = find_library("c") File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 350, in find_library _findSoname_ldconfig(name) or \ File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 290, in _findSoname_ldconfig with subprocess.Popen(['/sbin/ldconfig', '-p'], File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 1662, in _execute_child part = os.read(errpipe_read, 50000) KeyboardInterrupt ^C heyupeng_2020@irip-114:~$

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 8106 C python 11075MiB | | 1 8106 C python 6681MiB | | 2 8106 C python 6681MiB | | 3 8106 C python 6681MiB | | 6 30886 C python 6115MiB | | 7 20708 C python 9571MiB | +-----------------------------------------------------------------------------+ heyupeng_2020@irip-114:~$

MrGiovanni / UNetPlusPlus

how to train on my designated GPU? #68