FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.31k stars 673 forks source link

some problems about streaming inference #389

Open shanhaidexiamo opened 2 months ago

shanhaidexiamo commented 2 months ago

Hi, I'm trying your new code about streaming inference based on webui.py. I run this demo on A10, and I found the rtf of the first chunk is very high, need to wait 4-5 seconds to get the first yield and all settings are set by default. So I'm not sure how long it takes for the first chunk of the streaming inference you tested here?

image

My second problem is, the audio volume of streaming inference fluctuates, while non-streaming does not have this problem. Do you also have the same issue?

Thank you

aluminumbox commented 2 months ago

this is intened, maybe due to libtorch performance when input chunk size changes

shanhaidexiamo commented 2 months ago

this is intened, maybe due to libtorch performance when input chunk size changes So you mean that the volume fluctuation is intended? And how long did it take for the first chunk in your experiment? Thank you

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity.