为什么我训练的声音克隆模型生成的声音是数据集里的人声，不是参考文件夹里的wav文件的人声呢？ /example/aishell3/vc1/

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

https://paddlespeech.readthedocs.io

Apache License 2.0

10.99k stars 1.83k forks source link

为什么我训练的声音克隆模型生成的声音是数据集里的人声，不是参考文件夹里的wav文件的人声呢？ /example/aishell3/vc1/ #3017

Closed Vebrun closed 1 year ago

Vebrun commented 1 year ago

其中am_ckpt是我迁移学习训练出来的单人模型，voc_ckpt是下载的预训练模型。同时我还有个疑问：voc_ckpt是需要我迁移学习吗？还是直接就用下载的预训练权重呢，readme在这点没有说明。

yt605155624 commented 1 year ago

可以根据需求决定是否要用自己的数据集 finetune vocoder
一句话声音克隆效果有限，可以使用小样本合成 https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tts_finetune/tts3

ben-8878 commented 1 year ago

@yt605155624 遇到同样的问题，能给解答下嘛，是我哪里设置的对吗