when I increased the batch size from 2 to 4, the training time for a step increased from 15s to 30s, why? Given that the GPU-Util is already 100% when the bs=2 and I only use one GPU to train the model, I can't figure out the reason of this phenomenon.
when I increased the batch size from 2 to 4, the training time for a step increased from 15s to 30s, why? Given that the GPU-Util is already 100% when the bs=2 and I only use one GPU to train the model, I can't figure out the reason of this phenomenon.