Open huskyachao opened 2 months ago
yes first chunk inference is slower due to there is no kv cache, also tn takes some time
yes first chunk inference is slower due to there is no kv cache, also tn takes some time
OK, I see. Thanks for your answer.
@aluminumbox I don't know internal details so this may be a stupid question, but in an use case where the voice is just one (e.g. a local personal assistant), can this be improved by preloading kv cache, or is that cache related to the sentence being spoken and so can't be preloaded and reused every time?
Hi, I found that after updating the code, the latency of the first chunk in streaming mode (inference_zero_shot) is still very high (around 3s-4s). I noticed that issue #294 also mentioned this problem before the updating. Is this normal for the current version of cosyvoice? If it is, is there any method to downgrade this latency? Because such latency can make real-time communication of cosyvoice very difficult.