Open hopingZ opened 1 year ago
您好,CMVN文件上传到了这个链接,可以试试~
关于梅尔谱的参数,看上去是正确的,只要保持跟utils/compute-fbank-feats.py
的流程一致就可以了。
(不敢认大佬的称号,只是做了一点代码上的工作哈哈)
太太太感谢了!!❤️
做了归一化感觉还是不太像,和论文demo还是有些差距,因为论文的模型训练数据更多吗?
@segmentationFaults 论文中的模型就是使用了LibriTTS train clean+other的数据,请问您具体测试用的是什么句子呢
句子是1089_134686_000002_000000, prompt 是我随便找的一个语音
@segmentationFaults 可否用提供的模型参数合成看看效果是否有区别呢
嗯,我试试看
Quick question:
How do you generate the CMVN file for new datasets? I've tried using extract_fbank.sh which uses utils/compute-cmvn-stats.py to generate a CMVN file, but the tensor I end up with has values orders of magnitude lower than the CMVN file you uploaded.
This is mine:
And this is yours:
I just ran extract_fbank.sh and it generated the CMVN file, but this doesn't seem quite right. Did you go through a different process? Thanks!
@danablend I think that might be still correct. The CMVN process does not actually print the "mean" of each feature dimension. It computes the summation and sum-of-squares on each feature dimension. So, if the number of samples are different, the computed CMVN values can have orders of magnitudes of difference. May I ask how large is your dataset?
@cantabile-kwok Aha, that is good to know. My dataset was very small, only about 1000 audio samples total.
If I have a dataset with a different size from the dataset that you used to generate the CMVN.ark file, would this still work okay, or could this cause big issues? Thanks!
@danablend That depends on whether you are training the model with this new dataset, or perform inference on this dataset.
感谢大佬的开源!想请问可以分享一下 cmvn.ark 这个文件吗 🙏🏻🙏🏻🙏🏻 目前直接用没标准化的梅尔谱当 prompt,发音都很清晰,就是音色不太像,想看看标准化后的效果 🙏🏻🙏🏻🙏🏻 另外想确认下关于梅尔谱的参数:
是不是这样加载进来再标准化一下,就可以跟模型适配了