CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
51.65k stars 8.66k forks source link

Synthesizer training speed is not varying with batch size. #1181

Open sanal-176 opened 1 year ago

sanal-176 commented 1 year ago

I'm following this #437 to fine-tune the synthesizer model. One thing I noticed, training time is not varying with batch size.

With default parameters, batch size = 12. memory used ~ 3683MiB

Found 476 samples +----------------+------------+---------------+------------------+ | Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) | +----------------+------------+---------------+------------------+ | 25k Steps | 12 | 3e-05 | 2 | +----------------+------------+---------------+------------------+ {| Epoch: 1/625 (40/40) | Loss: 0.4793 | 0.79 steps/s | Step: 295k | }

It took 1352.45 seconds for fine-tuning 1000 steps.

When I increased batch size to 64 anticipating that training time will decrease, It took nearly the same training time as above for 1000 steps.

batch size = 64 memory used ~ 10435MiB

Found 476 samples +----------------+------------+---------------+------------------+ | Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) | +----------------+------------+---------------+------------------+ | 25k Steps | 64 | 3e-05 | 2 | +----------------+------------+---------------+------------------+ {| Epoch: 1/3125 (8/8) | Loss: 0.4669 | 0.70 steps/s | Step: 295k | }

Even though I can see gpu consumption has increased for batch size=64, it's not reflecting in the training time.

raccoonML commented 1 year ago

Batch size affects training time when gpu is the bottleneck. The default batch of 12 is too low to saturate the gpu, which is why you're noticing that training speed remains the same when the batch size is increased.