[TTS]Aishell3 vc2 example not using 16k sample rate while extracting spk embedding

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Apache License 2.0

11.22k stars 1.86k forks source link

Aishell3 vc2 example not using 16k sample rate while extracting spk embedding

Although it checks the sample rate of input audio but I don't see any process that handles the resampling work. https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L492-L497

And you can see when the code excutes to this line, sr equals 44100 if you print sr here. https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L415

Thus, melspectrogram receives a waveform loaded with 44100 sample rate and a mismatch sample rate self.config.sr which equals 16000 https://github.com/PaddlePaddle/PaddleSpeech/blob/9cf8c1985a98bb380c183116123672976bdfe5c9/paddlespeech/cli/vector/infer.py#L422-L427

PaddlePaddle / PaddleSpeech

[TTS]Aishell3 vc2 example not using 16k sample rate while extracting spk embedding #3181