Open sonclark opened 2 months ago
Yes, that is expected because we initialize a new model for every new client so, batching would certainly help. But 30 seconds is something that I havent seen even with 4 clients connected simultaneously. Could be the GPU, which GPU are you running the server?
@makaveli10 I am running on a RTX 4060. When I try to run 3-4 clients locally using the TranscriptionClient class, it does not seem to have that much of a latency. It is especially bad when I connect via browser (the latency is observed right after the server responses with the ready status). I am still testing out different set up regarding this.
@makaveli10 after checking a few combination, it does not seem to cause by connection via browser. It seems that if I initiate a new client (using the class TranscriptionClient) without running it, it will affects the existing running clients. When I connect directly to the WhisperLiveServer (not through TranscriptionClient), and let it sit without sending any data, I observed the same behavior.
I have looked into the source code but still cannot understand how that could be the case. Hope you can give me some insight into this.
Have you tried setting single_model
to True
?
I am creating a live transcription webpage that connect directly to the WhisperLive server via websocket. For a single client, the performance is great (less than 1 second). When i add another client (open the webpage in another browser), the transcription speed greatly decreased (up to 30 seconds).
I have test some setups to figure out the possible issue. Setup 1:
Setup 2:
Adding more log into the server and I noticed the process that get slowed down is under
transcriber
feature extractor.I want to understand more about how adding client in this manner could affect the performance so much.