RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
24.18k stars 3.58k forks source link

MultiGPU training seems slower than single #1547

Closed blakejrobinson closed 6 months ago

blakejrobinson commented 11 months ago

I recently moved an extra GPU into my machine from another (mainly to help with other ML tasks). I figured I could also use it to speed up the training of RVC voices. I have a 4090 and a 3090.

However, when using multiple GPUs, something weird happens -

1) The step numbers seem different. For example with the single GPU, at epoch 20 it's reached step 115 of this dataset, but with double GPU it's reached step 95.

2) Looking at TensorBoard, the 2x GPU instance seems to be trained at half the speed of the 1xGPU training - for example:

image

The image above shows the exact same point of time in the training on both runs. The purple is the single GPU setup whereas the grey is the double GPU run. The curve itself looks similar; it just seems that the double GPU setup is taking over twice as long to get there, which seems counter-intuitive.

Same settings for both runs, just with the second one, gpu to use was changed from 0-1 to 0. Both are the same dataset (45 wavs), 48k, v2.

Is multi-gpu just not really supported with this kind of setup?

treya-lin commented 11 months ago

Hi, I also tried multi-gpu training once but didn't work out. May I ask: when you train with two cards, how did the required virtual memory change compared to single-gpu mode? Would each GPU require more memory in that case?

blakejrobinson commented 11 months ago

I had a quick check on vram usage.

This again doesn't make sense to me - I'd expect memory use to be dictated by batch size, so be close, or the same. Would there really be 13gb overhead (6.5/card)?

treya-lin commented 11 months ago

I had a quick check on vram usage.

  • dual-gpu mode, each card used ~14.5gb (with a batch size per gpu of 12).
  • single-gpu mode, the single card uses 8gb (with a batch size per gpu of 12).

This again doesn't make sense to me - I'd expect memory use to be dictated by batch size, so be close, or the same. Would there really be 13gb overhead (6.5/card)?

Hi thanks for your reply! Yes the VRAM increase looks very strange, and it does not make sense that it slows down the training. Let's see if the maintainers could figure out what might have been going wrong.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 15 days since being marked as stale.