Open steven8274 opened 3 months ago
the latency is due to gradio audio cache. flow matching is not streaming model, so the overlap voice is inevitable. we are also trying to solve it.
the latency is due to gradio audio cache. flow matching is not streaming model, so the overlap voice is inevitable. we are also trying to solve it.
The gradio audio component also introduce audio delay, but I the delay I said is not that one.I print the time before TTS, and the time when the first audio chunk generated.The time diff is about 2 seconds.For streaming TTS used in realtime comunication, 2 seconds delay is not acceptible.
yes,i meet the same problem, 2s latency can not for my realtme cmunication.
Did anyone compare the streaming mode and non-streaming mode? I found that the RTF (Real-time Factor = consuming_time / audio_len)
of the streaming mode (1.5) is larger than the non-streaming mode(1.3). I wonder if this is the expected result for the RTF between the streaming/non-streaming mode.
This issue is stale because it has been open for 30 days with no activity.
Describe the bug Use the 'Inference_streaming' branch, the first audio chunk returned too late (over 2 seconds since tts text send to inference interface 'inference_sft') . The 2 seconds latency is too high for realtime comunication.I use CosyVoice in 'stt + llm + tts' chain.Now, the latencies in stt and llm are acceptible, only the tts latency is not low enough.Besides, when I use CosyVoice streaming tts, the voice quality decreased.There're some overlapped voices.For achieving lower stream tts latency,I changed these two configurations to half of their original values(in 'cosyvoice/cli/model.py'):
However, the lantency is not decreased obviously but voice quality gets worse(more obvious overlapped voice).
To Reproduce Steps to reproduce the behavior:
Expected behavior The first audio chunk returned in an quite short time(under 300ms or 500ms is expected) and there is no overlapped voice can be heard.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):