collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.99k stars 272 forks source link

Live Transcription speed greatly affected when adding new Client via web browser #271

Open sonclark opened 2 months ago

sonclark commented 2 months ago

I am creating a live transcription webpage that connect directly to the WhisperLive server via websocket. For a single client, the performance is great (less than 1 second). When i add another client (open the webpage in another browser), the transcription speed greatly decreased (up to 30 seconds).

I have test some setups to figure out the possible issue. Setup 1:

Setup 2:

Adding more log into the server and I noticed the process that get slowed down is under transcriber feature extractor.

I want to understand more about how adding client in this manner could affect the performance so much.

makaveli10 commented 2 months ago

Yes, that is expected because we initialize a new model for every new client so, batching would certainly help. But 30 seconds is something that I havent seen even with 4 clients connected simultaneously. Could be the GPU, which GPU are you running the server?

sonclark commented 2 months ago

@makaveli10 I am running on a RTX 4060. When I try to run 3-4 clients locally using the TranscriptionClient class, it does not seem to have that much of a latency. It is especially bad when I connect via browser (the latency is observed right after the server responses with the ready status). I am still testing out different set up regarding this.

sonclark commented 1 month ago

@makaveli10 after checking a few combination, it does not seem to cause by connection via browser. It seems that if I initiate a new client (using the class TranscriptionClient) without running it, it will affects the existing running clients. When I connect directly to the WhisperLiveServer (not through TranscriptionClient), and let it sit without sending any data, I observed the same behavior.

I have looked into the source code but still cannot understand how that could be the case. Hope you can give me some insight into this.