Problems when sentence is mix of Chinese and English

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Apache License 2.0

5.29k stars 543 forks source link

for example， the following sentence in this code，result in very differenct voice!!

text = "大家好，这里给大家介绍一篇名为AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE的论文，" output = cosyvoice.inference_sft( text.replace("-", ""), "中文女", ) for i, j in enumerate(output): torchaudio.save("sft_uppercase.wav".format(i), j["tts_speech"], 22050)

output = cosyvoice.inference_sft( text.replace("-", "").lower(), "中文女", ) for i, j in enumerate(output): torchaudio.save("sft_lowercase.wav".format(i), j["tts_speech"], 22050)

FunAudioLLM / CosyVoice

Problems when sentence is mix of Chinese and English #461