Instruct mode synthesis the instruct text in the final audio

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

4.53k stars 457 forks source link

Instruct mode synthesis the instruct text in the final audio #159

Open shuaijiang opened 1 month ago

shuaijiang commented 1 month ago

I try the instruct mode to synthesis the audio according to the instruct text, but the audio also contain the intruct text

from cosyvoice.cli.cosyvoice import CosyVoice
import torchaudio
cosyvoice = CosyVoice('iic/CosyVoice-300M-Instruct')
output = cosyvoice.inference_instruct('在面对挑战时，他展现了非凡的勇气与智慧。', '中文男', '女声，快语速')
torchaudio.save('instruct.wav', output['tts_speech'], 22050)

aluminumbox commented 1 month ago

I tried the same code but didn't see the problem, try again

shuaijiang commented 1 month ago

thx, I get the point. The instruct text don't support Chinese?