In the training, I use the biaobei dataset and modify the sampling rate to 48000. I use Pinyin modeling (from character to sequence), and the batch size is set to 32. At present, after training 57K steps, loss no longer decreases, but there is no alignment at all, and the synthesis speech wave is also wrong. What are the possible reasons and how should they be adjusted?
我在训练采用标贝数据集 基于pytorch的中文语音合成baseline,模型主要参考了nvidia/Tacotron2,目前尝试了拼音建模(字母编码转sequence),batch size 设为32,采样率为48000。目前训练57k steps后,loss已经不再下降(0.3左右),但完全没有对齐,inference结果也不对。请问可能是什么原因,应该如何调整呢?
In the training, I use the biaobei dataset and modify the sampling rate to 48000. I use Pinyin modeling (from character to sequence), and the batch size is set to 32. At present, after training 57K steps, loss no longer decreases, but there is no alignment at all, and the synthesis speech wave is also wrong. What are the possible reasons and how should they be adjusted?