Add HTTP streaming for local models

Cohee1207 commented 9 months ago

Description

Adds a proper TTS streaming via HTTP by using coqui's inference_stream method and fastapi's StreamingResponse. The client can consume the new data as soon as it's ready. I found the chunk size of 100 (coqui tokens I assume?) to provide a favorable latency/interruption rate on my MacBook running CPU inference.

Implications

Works only with local models.
Uses HTTP GET instead of HTTP POST. Explanation below.

Initially, I wanted to stick to HTTP POST requests only and do audio playback using client-side JavaScript, but unfortunately MediaSource does not support working with WAV data. Adding intermediate compression would only increase latency and create more complexity. Using HTTP GET allows doing playback directly from HTML by setting the audio source to the API endpoint, the browser will do all the buffering and decoding at no extra cost.

Related SillyTavern pull request: https://github.com/SillyTavern/SillyTavern/pull/1623

References

daswer123 commented 9 months ago

Wow great job, I'll check it out now

daswer123 commented 9 months ago

Checked everything works great, thanks for your work :)

daswer123 / xtts-api-server