FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.11k stars 655 forks source link

用其他语种作为prompt_speech_16k进行声音克隆无法实现跨语种 #411

Open Lawrenceeeeeeee opened 1 month ago

Lawrenceeeeeeee commented 1 month ago

Describe the bug 用其他语种作为prompt_speech_16k进行声音克隆无法实现跨语种

To Reproduce Steps to reproduce the behavior:

  1. 导入了一段ずんだもん的日语音频,<|jp|>日本には四季があり、それぞれの季節に美しい風景や特別な行事があります。
  2. 尝试生成中文语音”<|zh|>人间灯火倒映湖中,她的渴望让静水泛起涟漪。若代价只是孤独,那就让这份愿望肆意流淌。流入她所注视的世间,也流入她如湖水般澄澈的目光。“
  3. 生成出的音频全是日语,所有的汉字都是日语的读法

Expected behavior 按理来讲应该是可以跨语种的,我看官网演示的时候是可以的

from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio
import time

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')
# zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean

zundamon_audio = 'zdm.wav'
zundamon_text = '<|jp|>日本には四季があり、それぞれの季節に美しい風景や特別な行事があります。'

prompt_speech_16k = load_wav(zundamon_audio, 16000)

while True:
    text = input("输入待合成文本:")
    for i, j in enumerate(cosyvoice.inference_zero_shot(text, zundamon_text, prompt_speech_16k, stream=False)):
        torchaudio.save(f'zero_shot_zdm_{i}_{int(time.time())}.wav', j['tts_speech'], 22050)
aluminumbox commented 1 month ago

please upload your prompt wav

Lawrenceeeeeeee commented 1 month ago

Github doesn't support wav files, so I will leave a baidu network disk link here prompt_speech

链接: https://pan.baidu.com/s/1sjp3NWfTU_o1FgDJ8qtq4Q?pwd=1111 提取码: 1111 复制这段内容后打开百度网盘手机App,操作更方便哦 
--来自百度网盘超级会员v5的分享
Lawrenceeeeeeee commented 1 month ago

please upload your prompt wav

所以说这个问题能复刻吗

aluminumbox commented 1 month ago

please upload your prompt wav

所以说这个问题能复刻吗

公司内网无法访问百度网盘,可以上传zip文件

Aisaka0v0 commented 3 weeks ago

参考cross_lingual usage, 不要用zero_shot