Multi GPU Training Does not Scale Linearly

Hello,

I tried running multi-gpu training on g4dn.metal instance with default training parameters. The only change that I made was to multiply the batch size by 8 (number of GPUs), i.e. 16. I am using v2.4.1. When I train the model with nnUNetv2_train 4 3d_fullres 0 -num_gpus 8, I see that all GPUs are at 100% but I do not see a significant speed up. Epochs 2 and later run for 212 seconds for 8 GPUs vs. 250 seconds for 1 GPU. I did not expect perfect linear scaling but I thought I would at least get a 50% speed up. Is there something that I am doing wrong? I also tried increasing the batch size from 16 to 32 but that resulted in worse performance: 342 seconds / epoch.

MIC-DKFZ / nnUNet

Multi GPU Training Does not Scale Linearly #2558