[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.56k stars 3.23k forks source link

Related to Model/Framework(s) PyTorch/Segmentation/nnUNet

Describe the bug I am trying to run the example provided on the nnUNet. The code works fine when I use single GPU. However, if I request for 2 GPU it will not work. Following command works: python scripts/benchmark.py --mode train --gpus 1 --dim 3 --batch_size 2 --amp

Following command gets stuck python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp

387 training, 97 validation, 484 test examples Filters: [32, 64, 128, 256, 320, 320], Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]] Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called full_state_update that has not been set for this class (Dice). The property determines if update by default needs access to the full metric state. If this is not the case, significant speedups can be achieved and we recommend setting this to False. We provide an checking function from torchmetrics.utilities import check_forward_full_state_property that can be used to check if the full_state_update=True (old and potential slower behaviour, default for now) or if full_state_update=False can be used safely.

warnings.warn(*args, **kwargs) Using 16bit native Automatic Mixed Precision (AMP) Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default ModelSummary callback. GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop. rank_zero_warn("You defined a validation_step but have no val_dataloader. Skipping val loop.") Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2

To Reproduce Steps to reproduce the behavior:

Install '...' : git clone https://github.com/NVIDIA/DeepLearningExamples cd DeepLearningExamples/PyTorch/Segmentation/nnUNet docker build -t nnunet . mkdir data results sudo singularity build nnunetMultiGPU.sif docker-daemon://nnunet:latest
Launch : singularity shell --nv -B ${PWD}/data:/data -B ${PWD}/results:/results -B ${PWD}:/workspace nnunetMultiGPU.sif

Expected behavior Training to start as provided in the example

Environment Please provide at least:

Container version (e.g. pytorch:19.05-py3): PyTorch 21.11 NGC container
GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 2x Tesla V100-SXM3- 32 GB
CUDA driver version (e.g. 418.67): 440.118.02

oot@6e38dc6f86a4:/workspace/nnunet_pyt# python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp 387 training, 97 validation, 484 test examples Filters: [32, 64, 128, 256, 320, 320], Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]] Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has not been set for this class (Dice). The property determines if `update` by default needs access to the full metric state. If this is not the case, significant speedups can be achieved and we recommend setting this to `False`. We provide an checking function `from torchmetrics.utilities import check_forward_full_state_property` that can be used to check if the `full_state_update=True` (old and potential slower behaviour, default for now) or if `full_state_update=False` can be used safely. warnings.warn(*args, **kwargs) Using 16bit native Automatic Mixed Precision (AMP) Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback. GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop. rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.") Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2 Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2 ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 2 processes ---------------------------------------------------------------------------------------------------- LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1] | Name | Type | Params -------------------------------------------------------- 0 | model | DynUNet | 31.2 M 1 | model.input_block | UnetBasicBlock | 31.2 K 2 | model.downsamples | ModuleList | 8.5 M 3 | model.bottleneck | UnetBasicBlock | 5.5 M 4 | model.upsamples | ModuleList | 17.2 M 5 | model.output_block | UnetOutBlock | 132 6 | model.skip_layers | DynUNetSkipLayer | 31.2 M 7 | loss | Loss | 0 8 | loss.loss_fn | DiceCELoss | 0 9 | dice | Dice | 0 -------------------------------------------------------- 31.2 M Trainable params 0 Non-trainable params 31.2 M Total params 62.386 Total estimated model params size (MB) Epoch 0 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/150 0:00:23 • 0:01:14 2.00it/s loss: 2.81

NVIDIA / DeepLearningExamples

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189