PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.99k stars 1.83k forks source link

请教在aishell3上训练fastspeech2模型的相关问题 #3010

Closed yangqinj closed 1 year ago

yangqinj commented 1 year ago

Others

最近在aishell3数据集上训练fastspeech2模型的时候,遇到了几个问题想请教一下。

【说明】声码器是HiFiGAN,batch size为64,MFA使用的是1.x版本,在自己的数据集上训练的mfa模型

yt605155624 commented 1 year ago
  1. TTS 实际音频听感比 loss 更有意义,如果你训练过 HiFiGAN, 甚至会发现 fm loss 呈上升趋势
  2. phoneme-level 更加稳定,应该有个 ming 的 fs2 是用 frame-level,有用户反馈效果不如 PaddleSpeech(但可能也有其他原因),anyway 你觉得哪个好就用哪个
yangqinj commented 1 year ago

好的,感谢回复!

yuxi-chen19 commented 1 year ago

请问有做数据对齐的mfa模型吗,这个字典的模型目前手里还没有