MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
401 stars 64 forks source link

Different sample rate #7

Closed wizardk closed 1 year ago

wizardk commented 1 year ago

Hi @MasayaKawamura , thanks for your work.

I have a question. If I want to use the 16K sampling rate, how do I modify the configuration file? It should not just modify sampling_rate in json.

MasayaKawamura commented 1 year ago

Hi @wizardk, thank you for the question. I think you can train the model with the 16k sampling rate by modifying the sampling_rate in json file. For example, in the case of MS-iSTFT-VITS, you need to modify this line.

wizardk commented 1 year ago

Hi @wizardk, thank you for the question. I think you can train the model with the 16k sampling rate by modifying the sampling_rate in json file. For example, in the case of MS-iSTFT-VITS, you need to modify this line.

Thanks for your help. But I think it is necessary to modify fft_sizes, hop_sizes, win_lengths, filter_length, hop_length, and win_length as well as sampling_rate. Is that right?

maytusp commented 1 year ago

Hi @wizardk, I have the same problem as yours. Do you know what parameters to be adjusted for 16k?

JohnHerry commented 11 months ago

I have the same problem. I am tring the 16K, I changed only the sample_rate parameter, but the synthesiszed speech are bad, they speech too slow, just like I was playing 24KHz audio in 16K format. all phoneme durations are strange.

JohnHerry commented 11 months ago

I have the same problem. I am tring the 16K, I changed only the sample_rate parameter, but the synthesiszed speech are bad, they speech too slow, just like I was playing 24KHz audio in 16K format. all phoneme durations are strange.