Training speed - Githubissues

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

MIT License

7.59k stars 756 forks source link

Training speed #91

Open constan1 opened 1 year ago

constan1 commented 1 year ago

Hi how long did it take to train this model? I am currently training on my own implementation on a DGX 4 v100 cluster with deepspeed integrated. Gradient accumulation of 4 across micro batch size of 10 effectively 160 batch size. Still taking very, very long.

Plachtaa commented 1 year ago

In my case it was 2 weeks on 2 RTX 3090, your 4 V100 cluster must be powerful enough to finish training from scratch in time.

constan1 commented 1 year ago

Is there a specific model loss/validation loss you employed as a benchmark for convergence?