推理的声音质量？

juntaosun commented 6 months ago

很棒的项目，我训练后可以正常推理，发音也正常。但和训练素材对比，音质听起来不是很明亮和清脆，（我确认不是训练素材质量问题）。

检查了训练素材音频采样率和配置保持一致 44100 。如何改善推理的音质呢？再次感谢~

lucasjinreal commented 5 months ago

推理速度咋样啊

GavinZhao19 commented 5 months ago

推理速度咋样啊

推理速度挺不错的，我自己测的就用普通的T4的gpu，32个单词的句子说出来15s，推理速度250ms，很快。

juntaosun commented 5 months ago

推理速度咋样啊

推理速度挺不错的，我自己测的就用普通的T4的gpu，32个单词的句子说出来15s，推理速度250ms，很快。

推理速度确实不错，唯一的问题就是音质，虽然是44100，但实际上有点像是电话语音音质，你们也是这样吗？

GavinZhao19 commented 5 months ago

推理速度咋样啊

推理速度挺不错的，我自己测的就用普通的T4的gpu，32个单词的句子说出来15s，推理速度250ms，很快。

推理速度确实不错，唯一的问题就是音质，虽然是44100，但实际上有点像是电话语音音质，你们也是这样吗？

可能跟step设的也有关，项目结构是diffusion transformer，我设置step的话15比5好，但是step多了速度会慢一些。目前15的话也确实声音质量有点电话音质，可以试试设置step多一点，或者等作者更新更好的base model

GavinZhao19 commented 5 months ago

推理速度咋样啊

推理速度挺不错的，我自己测的就用普通的T4的gpu，32个单词的句子说出来15s，推理速度250ms，很快。

推理速度确实不错，唯一的问题就是音质，虽然是44100，但实际上有点像是电话语音音质，你们也是这样吗？

我今天也训了一个，确实推理的声音质量差一些，估计要等更好的pretrained model了

KdaiP commented 5 months ago

新的模型已经在训了╮(╯▽╰)╭ 目前也在尝试其他架构，看看能不能在不提升参数量的情况下提升音质o(~▽~)d

GavinZhao19 commented 5 months ago

新的模型已经在训了╮(╯▽╰)╭ 目前也在尝试其他架构，看看能不能在不提升参数量的情况下提升音质o(~▽~)d

目前，推理性能非常不错。感觉是不是可以增加点参数，搞个，小杯，中杯，大杯。降低推理速度的情况下，看看性能提升啥的。

KdaiP commented 5 months ago

新的模型已经在训了╮(╯▽╰)╭ 目前也在尝试其他架构，看看能不能在不提升参数量的情况下提升音质o(~▽~)d

目前，推理性能非常不错。感觉是不是可以增加点参数，搞个，小杯，中杯，大杯。降低推理速度的情况下，看看性能提升啥的。

@GavinZhao19 我试了下加参到78M参数，训了一晚上效果比10M参数训5天好上不少。后续确实可以训练几个不同参数的版本

juntaosun commented 5 months ago

最近有更新吗？

KdaiP commented 5 months ago

最近有更新吗？

前两天发现线性频谱变换写错了，导致声音不佳，修正后音质有了很大提升。

由于是预处理时出现的错误，目前正在重新训练声学模型和声码器，大概还要1-2周左右

修正后的频谱参数会与vocoder完全相同

juntaosun commented 5 months ago

最近有更新吗？

前两天发现线性频谱变换写错了，导致声音不佳，修正后音质有了很大提升。

由于是预处理时出现的错误，目前正在重新训练声学模型和声码器，大概还要1-2周左右

修正后的频谱参数会与vocoder完全相同

等你更新后测试一下。

xinkez commented 4 months ago

最近有更新吗？

前两天发现线性频谱变换写错了，导致声音不佳，修正后音质有了很大提升。

由于是预处理时出现的错误，目前正在重新训练声学模型和声码器，大概还要1-2周左右

修正后的频谱参数会与vocoder完全相同

你好，我对比了代码根目录下的config.py和vocos_pytorch目录中相关参数，没看出你提到的错误？想请教一下，谢谢

juntaosun commented 1 month ago

这个项目还在活动吗？

ILG2021 commented 1 month ago

思路很先进，只是质量还有待提高，期待有能与elevenlabs相当的开源tts出现。

albluc24 commented 1 month ago

Hi, I saw that it was discovered a bug in the linear conversion of the spectrogram. There isn't any PR or commit or issue explaining it further. I tried comparing the implementations and parameters with the provided link but I saw nothing amiss. @KdaiP Could you shed some light on this?

juntaosun commented 1 month ago

Hi, I saw that it was discovered a bug in the linear conversion of the spectrogram. There isn't any PR or commit or issue explaining it further. I tried comparing the implementations and parameters with the provided link but I saw nothing amiss. @KdaiP Could you shed some light on this?

Will this configuration fix the sound quality issue?

albluc24 commented 1 month ago

TBH I have no idea. I am not a chinese speaker at all so I translated what the author sayd and tryed to piece together something. I don't think that my fix is what they intended as it is too shallow. I was planning to use this architecture but I have limited machine resources ATM, and having no guarantee that the fix works I probably will look at something else. If you are willing to test it, maybe you could try something even on a small scale if you're in a better situation than me?

KdaiP commented 3 weeks ago

这个项目还在活动吗？

经过四个月的实验，模型已更新

KdaiP commented 3 weeks ago

TBH I have no idea. I am not a chinese speaker at all so I translated what the author sayd and tryed to piece together something. I don't think that my fix is what they intended as it is too shallow. I was planning to use this architecture but I have limited machine resources ATM, and having no guarantee that the fix works I probably will look at something else. If you are willing to test it, maybe you could try something even on a small scale if you're in a better situation than me?

Hi, the new model has been updated!

KdaiP / StableTTS

推理的声音质量？ #6