Closed HighCWu closed 2 years ago
目前voice clone的效果不好。
不是效果好不好的问题,Tacotron2预训练版的能正常clone出不同音色,为什么fastspeech2预训练版的结果全都是同一个人的音色
不是效果好不好的问题,Tacotron2预训练版的能正常clone出不同音色,为什么fastspeech2预训练版的结果全都是同一个人的音色
Thanks for using paddlespeech's voice cloning! Your conclusion is very useful to me and other users.
vc0 (Tacotron2) 's traing steps is less than vc1(fastspeech2) (you can also find this in release models' names and configs), because Tacotron2's training is instable (see training loss in https://github.com/PaddlePaddle/PaddleSpeech/discussions/1434 ), I have early stopped Tacotron2's training,I haven't compared vc0 and vc1, maybe vc1 has over fitting.. You can try to train your own vc1 and try to early stop ~
also you can use the new voiceprint recognition model we release https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0
@yt605155624 thanks, I will try to train it by myself. 我会试着训练它,只是你们发布一些预训练模型应该先测试一下效果,vc0的readme里面还建议用vc1,结果vc1提供的预训练模型完全等于没效果😂。 我有空训练出来了会来反馈的。
Looking forward to your feedback, it would be better if you could provide a useful config file or pretrained model
FYI here is my test result for vc: ref_audio.zip
# Randomly generate numbers of 0 ~ 0.2, 256 is the dim of spk_emb
for i in range(10):
random_spk_emb = np.random.rand(256) * 0.2
random_spk_emb = paddle.to_tensor(random_spk_emb)
utt_id = "random_spk_emb" + "_" + str(i)
with paddle.no_grad():
wav = voc_inference(am_inference(phone_ids, spk_emb=spk_emb))
sf.write(
str(output_dir / (utt_id + ".wav")),
wav.numpy(),
samplerate=am_config.fs)
print(f"{utt_id} done!")
sorry, I just find we haven't use the random emb by a developer's pr ... https://github.com/PaddlePaddle/PaddleSpeech/pull/1828/files
确实感觉 vc1 要比 vc0 效果好多了,不过感觉还是有一点电流音,而且语句中间没有停顿感,请问这个可以怎么优化?
Describe the bug Always the same speaker output in fastspeech2 aishell3 voice conversion
To Reproduce