重写spk2info.pt文件解决句子中间停顿较短的问题

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

5.63k stars 591 forks source link

重写spk2info.pt文件解决句子中间停顿较短的问题 #169

Open LayBrick opened 3 months ago

LayBrick commented 3 months ago

我听说可以通过提取声音特征，重写spk2info.pt来解决语音停顿较短的问题。请问这样做的原理是什么？在保存spk2info.pt时，是只需要对多条音频的embedding进行均值处理吗？那多条音频的speech_token是进行拼接吗？speech_feat应该如何处理

aluminumbox commented 3 months ago

in spk2info.pt, the embedding is the average of utterance embedding, see tools/extract_embedding.py for how to extract spk embedding. the speech_token is extracted from a relatively good-quality audio of the speaker, but it is not used in sft inference mode, you can ignore it.

LayBrick commented 3 months ago

Whether rewriting spk2info.pt is useful to solve the short voice pause problem? How does this work?

jupinter commented 2 months ago

in spk2info.pt, the embedding is the average of utterance embedding, see tools/extract_embedding.py for how to extract spk embedding. the speech_token is extracted from a relatively good-quality audio of the speaker, but it is not used in sft inference mode, you can ignore it.

在pretrained sft模型中试过采用spk_embedding + speech_token&speech_feat,但是测试发现llm的稳定性下降了，容易出现漏、乱的现象，不用speech_token&speech_feat反而更稳定，还是没太理解背后的原因，能否帮忙介绍下？感谢！

aluminumbox commented 2 months ago

in spk2info.pt, the embedding is the average of utterance embedding, see tools/extract_embedding.py for how to extract spk embedding. the speech_token is extracted from a relatively good-quality audio of the speaker, but it is not used in sft inference mode, you can ignore it.

在pretrained sft模型中试过采用spk_embedding + speech_token&speech_feat,但是测试发现llm的稳定性下降了，容易出现漏、乱的现象，不用speech_token&speech_feat反而更稳定，还是没太理解背后的原因，能否帮忙介绍下？感谢！

after sft finetune, you are inferencing with this speaker and its spk embedding, so during inference use his spk embedding, check inference_sft, this inference mode is more compatible with sft training