FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.59k stars 707 forks source link

运行 readme 中的 Demo失败 #357

Open zaiji100 opened 2 months ago

zaiji100 commented 2 months ago

现象 按照 README 步骤,运行到第一个 Demo失败,具体如下

(cosyvoice) english@work-dev:/opt/soft/speech/CosyVoice$ python3 demo.py
2024-09-05 20:42:09,571 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2024-09-05 20:42:09,571 - modelscope - INFO - Loading ast index from /home/english/.cache/modelscope/ast_indexer
2024-09-05 20:42:09,608 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 b230aa26a96c48d7b284209f5ac5d957 and a total number of 980 components indexed
/home/english/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2024-09-05 20:42:12,307 INFO input frame rate=50
Traceback (most recent call last):
  File "demo.py", line 5, in <module>
    cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
  File "/opt/soft/speech/CosyVoice/cosyvoice/cli/cosyvoice.py", line 33, in __init__
    self.frontend = CosyVoiceFrontEnd(configs['get_tokenizer'],
  File "/opt/soft/speech/CosyVoice/cosyvoice/cli/frontend.py", line 52, in __init__
    self.campplus_session = onnxruntime.InferenceSession(campplus_model, sess_options=option, providers=["CPUExecutionProvider"])
  File "/home/english/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/english/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 480, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from pretrained_models/CosyVoice-300M-SFT/campplus.onnx failed:Protobuf parsing failed.
(cosyvoice) english@work-dev:/opt/soft/speech/CosyVoice$ cat demo.py
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
# sft usage
print(cosyvoice.list_avaliable_spks())
# change stream=True for chunk stream inference
for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
    torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')
# zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
# cross_lingual usage
prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):
    torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
# instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):
    torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)

To Reproduce Steps to reproduce the behavior:

  1. 按照 README 步骤,在Ubuntu 22.04 下运行到上述 Demo 时发生的

Desktop (please complete the following information):

aluminumbox commented 2 months ago

check faq.md, install git-lfs

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity.