FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.58k stars 707 forks source link

试用zero_shot的时候卡住不动, 也没报错信息 #380

Open redpintings opened 2 months ago

redpintings commented 2 months ago

Describe the bug from cosyvoice.cli.cosyvoice import CosyVoice from cosyvoice.utils.file_utils import load_wav import torchaudio

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')

zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean

prompt_speech_16k = load_wav('./33.wav', 16000) for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=True)): torchaudio.save('zeroshot{}.wav'.format(i), j['tts_speech'], 22050)

我在运行上面的代码的时候 ,卡住不动,未报错,不知道为什么,

Screenshots

image

系统: ubuntu

Additional context 运行你们的demo 代码可以跑通,当我使用我自己的音频文件的时候,却卡住,不知这是为什么? log: (venv) bigdata@gpu2 CosyVoice (main) $ CUDA_VISIBLE_DEVICES=6 python main.py 2024-09-11 14:06:32,785 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2024-09-11 14:06:32,785 - modelscope - INFO - Loading ast index from /home/bigdata/.cache/modelscope/ast_indexer 2024-09-11 14:06:32,905 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 a225ead027c004c55e33ac889b659bd5 and a total number of 980 components indexed /home/bigdata/projects/ysl/paint/cosyvoice/venv/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: LoRACompatibleLinear is deprecated and will be removed in version 1.0.0. Use of LoRACompatibleLinear is deprecated. Please switch to PEFT backend by installing PEFT: pip install peft. deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message) 2024-09-11 14:06:41,183 INFO input frame rate=50 2024-09-11 14:06:43.848015189 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-09-11 14:06:43.848039209 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. load leagacy transf breakmodel load leagacy transf breakmodel text.cc: festival_Text_init open voice lang map failed break model index not valid tn 希望你以后能够做的比我还好呦。 to 希望你以后能够做的比我还好呦。 tn 收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。 to 收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。 0%| | 0/1 [00:00<?, ?it/s]2024-09-11 14:06:53,994 INFO synthesis text 收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。

CrisYangBW commented 2 months ago

cosyvoice.inference_zero_shot(infer_text, prompt_text, prompt_wav), 第一个参数是要推理的文本,第二个参数是prompt音频对应的文本,文本和音频对应不上就会一直卡住,你需要修改一下第二个参数

redpintings commented 2 months ago

cosyvoice.inference_zero_shot(infer_text,prompt_text,prompt_wav),第一个参数是要推理的文本,第二个参数是提示音频应答的文本,文本和音频应答不会一直卡住,你需要修改一下第二个个参数

我靠,大意了竟然没发现, 多谢提醒 🐮

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity.