Open giymen opened 4 years ago
Hi, I trained the autoregressive models for about 600K steps (some less) and around the same for the forward models. This should take, if I remember correctly, about 2-3 days (on RTX 2080).
I'm getting 1.7 s/it on tts training and 5.9 s/it on aligner training on a Tesla P100 16GB on Colab
I'm trying to figure out how batch size plays into this, if I had more GPU memory for example. The config file only has bucket_batch_sizes
but no batch_size
so I'm not sure what batch size this is running on— I think bucket_batch_sizes is only for the aligner?
Also, it looks like my default config is different than yours @giymen https://github.com/as-ideas/TransformerTTS/blob/main/config/training_config.yaml
shows a max step of 260,000 for example, not 900,000, and not 600,000, so, there may also be other things changed (say dimensions) that would impact the number of parameters and therefore the training time. @cfrancesco could you explain the batch size and the discrepancy in default training configuration? Thanks!
I seem to have three options for default training configs:
1) The current one linked above in master branch
2) The one in the colab demo commit c3405c53e435a06c809533aa4453923469081147
3) The one in the from model.factory import tts_ljspeech
import, which has 260K max steps linked from https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/TransformerTTS/api_weights/ljspeech_tts_config_v1.yaml
Hi, batch sizes are dynamic. Samples are bucketed by duration, so the batch size depends on how many samples there are in each bin. Max sizes are specified in the bucket_batch_sizes for each interval. The max steps have been reduced over time because of more efficient training (such as the addition of diagonality loss).
I am working on Colab, and for now, I'm trying to train the model with LJSpeech dataset. (just for trial, later I will use custom data)
I used parameters as in config files with "max_steps: 900_000" for melgan/autoregressive. It is my first TTS model experience, so I wanted to ask about training time. How many minutes/hours are expected for total training time of models?