Open vijaypshah opened 2 years ago
Hi,
I've run the command for 2 GPUs and it works fine for me:
oot@6e38dc6f86a4:/workspace/nnunet_pyt# python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp
387 training, 97 validation, 484 test examples
Filters: [32, 64, 128, 256, 320, 320],
Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]
Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has
not been set for this class (Dice). The property determines if `update` by
default needs access to the full metric state. If this is not the case, significant speedups can be
achieved and we recommend setting this to `False`.
We provide an checking function
`from torchmetrics.utilities import check_forward_full_state_property`
that can be used to check if the `full_state_update=True` (old and potential slower behaviour,
default for now) or if `full_state_update=False` can be used safely.
warnings.warn(*args, **kwargs)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.")
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
| Name | Type | Params
--------------------------------------------------------
0 | model | DynUNet | 31.2 M
1 | model.input_block | UnetBasicBlock | 31.2 K
2 | model.downsamples | ModuleList | 8.5 M
3 | model.bottleneck | UnetBasicBlock | 5.5 M
4 | model.upsamples | ModuleList | 17.2 M
5 | model.output_block | UnetOutBlock | 132
6 | model.skip_layers | DynUNetSkipLayer | 31.2 M
7 | loss | Loss | 0
8 | loss.loss_fn | DiceCELoss | 0
9 | dice | Dice | 0
--------------------------------------------------------
31.2 M Trainable params
0 Non-trainable params
31.2 M Total params
62.386 Total estimated model params size (MB)
Epoch 0 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/150 0:00:23 • 0:01:14 2.00it/s loss: 2.81
I've found that might be PLT issue with some systems, please check https://github.com/Lightning-AI/lightning/issues/4612
Related to Model/Framework(s) PyTorch/Segmentation/nnUNet
Describe the bug I am trying to run the example provided on the nnUNet. The code works fine when I use single GPU. However, if I request for 2 GPU it will not work. Following command works: python scripts/benchmark.py --mode train --gpus 1 --dim 3 --batch_size 2 --amp
Following command gets stuck python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp
387 training, 97 validation, 484 test examples Filters: [32, 64, 128, 256, 320, 320], Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]] Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called
full_state_update
that has not been set for this class (Dice). The property determines ifupdate
by default needs access to the full metric state. If this is not the case, significant speedups can be achieved and we recommend setting this toFalse
. We provide an checking functionfrom torchmetrics.utilities import check_forward_full_state_property
that can be used to check if thefull_state_update=True
(old and potential slower behaviour, default for now) or iffull_state_update=False
can be used safely.warnings.warn(*args, **kwargs) Using 16bit native Automatic Mixed Precision (AMP) Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default
ModelSummary
callback. GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined avalidation_step
but have noval_dataloader
. Skipping val loop. rank_zero_warn("You defined avalidation_step
but have noval_dataloader
. Skipping val loop.") Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2To Reproduce Steps to reproduce the behavior:
Install '...' : git clone https://github.com/NVIDIA/DeepLearningExamples cd DeepLearningExamples/PyTorch/Segmentation/nnUNet docker build -t nnunet . mkdir data results sudo singularity build nnunetMultiGPU.sif docker-daemon://nnunet:latest
Launch : singularity shell --nv -B ${PWD}/data:/data -B ${PWD}/results:/results -B ${PWD}:/workspace nnunetMultiGPU.sif
Expected behavior Training to start as provided in the example
Environment Please provide at least: