Open sanal-176 opened 1 year ago
Batch size affects training time when gpu is the bottleneck. The default batch of 12 is too low to saturate the gpu, which is why you're noticing that training speed remains the same when the batch size is increased.
I'm following this #437 to fine-tune the synthesizer model. One thing I noticed, training time is not varying with batch size.
With default parameters, batch size = 12. memory used ~ 3683MiB
Found 476 samples +----------------+------------+---------------+------------------+ | Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) | +----------------+------------+---------------+------------------+ | 25k Steps | 12 | 3e-05 | 2 | +----------------+------------+---------------+------------------+ {| Epoch: 1/625 (40/40) | Loss: 0.4793 | 0.79 steps/s | Step: 295k | }
It took 1352.45 seconds for fine-tuning 1000 steps.
When I increased batch size to 64 anticipating that training time will decrease, It took nearly the same training time as above for 1000 steps.
batch size = 64 memory used ~ 10435MiB
Found 476 samples +----------------+------------+---------------+------------------+ | Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) | +----------------+------------+---------------+------------------+ | 25k Steps | 64 | 3e-05 | 2 | +----------------+------------+---------------+------------------+ {| Epoch: 1/3125 (8/8) | Loss: 0.4669 | 0.70 steps/s | Step: 295k | }
Even though I can see gpu consumption has increased for batch size=64, it's not reflecting in the training time.