PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.22k stars 1.86k forks source link

[TTS]Aishell3 vc2 example not using 16k sample rate while extracting spk embedding #3181

Open keshawnhsieh opened 1 year ago

keshawnhsieh commented 1 year ago

Aishell3 vc2 example not using 16k sample rate while extracting spk embedding

Although it checks the sample rate of input audio but I don't see any process that handles the resampling work. https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L492-L497

And you can see when the code excutes to this line, sr equals 44100 if you print sr here. https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L415

Thus, melspectrogram receives a waveform loaded with 44100 sample rate and a mismatch sample rate self.config.sr which equals 16000 https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L422-L427

yt605155624 commented 1 year ago

那可能是声纹 cli 确实应该在 load_audio() 的时候输入 self.config.sr 。。 我没仔细看他们代码,以为他们有强制 resample 的过程 您感兴趣可以改一下试试,欢迎提交 pr

不过改了之后可能 vc 的模型需要重新训练下,也欢迎你提供新训练好的模型