FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
3.93k stars 373 forks source link

`、` 之后的文字没有生成语音 #105

Open weedge opened 1 month ago

weedge commented 1 month ago

Describe the bug 比如:这样的训练过程使得我能够回答各种问题、创作文字,以及进行多轮对话等任务。 这段文字,使用 CosyVoice-300M-SFT 模型推理生成的语音, 之后的文字没有生成语音。

To Reproduce

from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
# sft usage
print(cosyvoice.list_avaliable_spks())
output = cosyvoice.inference_sft('这样的训练过程使得我能够回答各种问题、创作文字,以及进行多轮对话等任务。', '中文男')
torchaudio.save('sft.wav', output['tts_speech'], 22050)
weedge commented 1 month ago

补充下,如果在外层实现来断句的话, 感觉 顿号 如果断句,太短了, 比如 句子出现 你、 我、他

这里有个私人开源项目: https://github.com/weedge/chat-bot 可以在终端把玩, 集成了 sense_voice (asr) -> qwen (llm) -> cosy_voice (tts) ; 可以在本地,或者将 be 在 modelscope 和 colab上运行

# fe
TTS_TAG=tts_cosy_voice \
  REDIS_PASSWORD=$redis_pwd \
  RUN_OP=fe \
  RECORDER_TAG=wakeword_rms_recorder \
  python -m src.cmd.remote-queue-chat.generate_audio2audio > ./log/fe_std_out.log

# be
# sense_voice (asr) -> qwen (llm) -> cosy_voice (tts)
RUN_OP=be \
  TQDM_DISABLE=True \
  REDIS_PASSWORD=$redis_pwd \
  ASR_TAG=sense_voice_asr \
  ASR_LANG=zn \
  N_GPU_LAYERS=33 FLASH_ATTN=1 \
  LLM_MODEL_NAME=qwen \
  LLM_MODEL_PATH=./models/qwen1_5-7b-chat-q8_0.gguf \
  ASR_MODEL_NAME_OR_PATH=./models/FunAudioLLM/SenseVoiceSmall \
  TTS_TAG=tts_cosy_voice \
  python -m src.cmd.remote-queue-chat.generate_audio2audio > ./log/be_std_out.log
aluminumbox commented 1 month ago

check text normalization result, do it manually if possible

SuperNodeLibs commented 1 month ago

check text normalization result, do it manually if possible

请告知具体方法

JYT59421 commented 3 weeks ago

这个问题还没有解决吗