关于采样率 - Githubissues

lifeiteng commented 1 year ago

README 中说明采用 16000 采样率，但是在 demo 页面 https://cpdu.github.io/unicats/ 中的音频是 24000 的采样率，这是什么原因呢？

cantabile-kwok commented 1 year ago

这个其实是出于方便考虑，因为我们一开始都是在16k上做，在论文中虽然使用了24k采样率来跟别的模型做公平比较，但是特征还是从16k波形中提取的（vq-wav2vec，pitch，mel等）。

如果需要生成24k，只需要更改config yaml中的参数变成这样：

upsample_scales: [8, 5, 3, 2]        # Upsampling scales. The product of these scales must be equal to the hop size
upsample_kernel_sizes: [16, 10, 6, 4] # Kernel size for upsampling layers. Should be 2 times the upsample scales

就能从10ms帧移的vq-wav2vec特征重建24k波形了

lifeiteng commented 1 year ago

原来如此

X-LANCE / UniCATS-CTX-vec2wav

关于采样率 #1