Got strange durations - Githubissues

MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Apache License 2.0

401 stars 64 forks source link

Thanks for the job, I am trying to make this project on another dataset, which are also 22.05Khz samples, But during traning, the generated speech in evaluation are strange in duration, the speech speed is very slow then the "gt" conterpart, the GT audio is about 5 seconds while the generated audio from evaluating are about 10 seconds. My config is exactly the same with the ljs_mb_istft_vits.json, except for my own filelist and text_cleaner, This kind of config show no such problem in the VITS training, can any body give some suggestion? and, I had also tried to set "add_blank" to false in config, things get better, but still not good enought. and in the original VITS, the add_blank option did not make any trouble in my dataset, true or false.

Thanks for the job, I am trying to make this project on another dataset, which are also 22.05Khz samples, But during traning, the generated speech in evaluation are strange in duration, the speech speed is very slow then the "gt" conterpart, the GT audio is about 5 seconds while the generated audio from evaluating are about 10 seconds. My config is exactly the same with the ljs_mb_istft_vits.json, except for my own filelist and text_cleaner, This kind of config show no such problem in the VITS training, can any body give some suggestion? and, I had also tried to set "add_blank" to false in config, things get better, but still not good enought. and in the original VITS, the add_blank option did not make any trouble in my dataset, true or false.

Is the DDP cost less then the SDP? it seems that the duration with DDP is worse then that from SDP.

MasayaKawamura / MB-iSTFT-VITS

Got strange durations #23