Closed FanhuaandLuomu closed 1 year ago
Hi! Even In the case of my training model with A100, I confirmed that the total training time of the ms-istft-vits is slow... I have confirmed that ms-istft-vits is slower than mb-istft-vits in training time.
Hi @MasayaKawamura, VITS model suffered from mispronunciation, so it usually has a large CER or WER when compared to other models. Did you see any pronunciation improvements for this model? Anyway, thank you for your amazing work.
Hi @MasayaKawamura , how many steps would the model get a relative good effect in your experiments ? I see in the paper that you trained 800k steps.
Hi @leminhnguyen, thank you for the question. I have not done any comparisons on WER, etc., so I don't know for sure. From the few samples (you can check audio sample on this demo page), I think there are few critical word errors. However, WER depends on the input text length, so I think a detailed analysis is needed.
This paper may be helpful about VITS and WER.
Hi @FanhuaandLuomu, thank you for the question. In the paper, all models were trained at 800k steps to match experimental conditions. I think It is a difficult problem because the hyperparameters and dataset are also related to how many steps are needed to obtain relatively good quality. This is just my opinion, I think you can synthesize a relatively good speech under 800k (I'll have to evaluate the specific number of steps with MOS to be sure).
Hi, thanks for your great work. Can you open source your small version model structure,thanks again. @MasayaKawamura
Hi @FanhuaandLuomu I added the config file for Mini-iSTFT-VITS and Mini-MB-iSTFT-VITS that were described in our paper. Please check the configs.
@FanhuaandLuomu @MasayaKawamura I have a problem, how long did it take for you to train 800K steps? Maybe one week?
Hi, @guoyingying432 I think the computation time depends on the hyperparameters, GPU, etc... In the conditions of the paper, it takes about one or two weeks.
HI, can you share your training speed with A100, such as the time cost for each 10k step. i'm training ms-istft-vits, and find it is slower than original vits.