100k+ concurrent connections

dpny518 commented 5 years ago

I am trying to use kaldi gstreamer for a project where there could be 100,000 current decoding request. I did some small test and I can serve 150 concurrent with a 16 GB 8 core, putting 150 workers, but this won't scale economically for me.

Does anyone have suggestions on how to solve this issue?

alumae commented 5 years ago

Actually, I'm surprised that you are able to use 150 workers on a 16GB 8 core machine -- I would expect that you could use much less than that. That makes me think that your models are very small.

Anyway, if you want to use large vocabulary speech recognition, you would need approximately 1 core per decoding request, or maybe 1 core per 2 requests. That means that you need 50,000 cores in parallel. However, a bigger problem is that kaldi-gstreamer-server is not very economical with memory -- each worker load its own instance of HCLG fst and the acoustic model.

In summary: I don't know what you are doing exactly, but if you need to serve 100,000 concurrent requests, you need industry-scale number of machines, regardless of the tool you use. Also, kaldi-gstreamer-server is not probably the best choice for this task. We never tested it with that many parallel requests (an probably never will).

dpny518 commented 5 years ago

Thank you, doing some project with schools in china, and need children to speak and transcribe while reading stories, if you have any suggestions on how I can build a rest api with streaming/chunked data transfer to serve this use base that would be very much appreciated

YunzhaoLu commented 5 years ago

Hi, yondu22, Would you please share some of ideas to achieve 150 concurrent workers？thank you very much! Regards, Luke

dpny518 commented 5 years ago

didn't do anything special, just started 150 workers

hanv89 commented 5 years ago

Actually, I'm surprised that you are able to use 150 workers on a 16GB 8 core machine -- I would expect that you could use much less than that. That makes me think that your models are very small.

Anyway, if you want to use large vocabulary speech recognition, you would need approximately 1 core per decoding request, or maybe 1 core per 2 requests. That means that you need 50,000 cores in parallel. However, a bigger problem is that kaldi-gstreamer-server is not very economical with memory -- each worker load its own instance of HCLG fst and the acoustic model.

In summary: I don't know what you are doing exactly, but if you need to serve 100,000 concurrent requests, you need industry-scale number of machines, regardless of the tool you use. Also, kaldi-gstreamer-server is not probably the best choice for this task. We never tested it with that many parallel requests (an probably never will).

Hi alumae, Do you know any speech to text server that can run in industry-scale that share the same idea as yours, or at least back by kaldi nnet. Would you mind listing some. Thank you so much.

dpny518 commented 5 years ago

I think this person has built something similar to gstreamer to be more scalable but his code is not open source or available, you can contact him https://vais.vn/speech-to-text-core/

alumae commented 5 years ago

See https://github.com/alphacep/kaldi-websocket-python "It contains proper server implementation of multithread processing of many streams with shared model data. Neither tcp-server nor gstreamer server nor py-kaldy-simple have that.

It also uses asyncio, a very straightforward parallelization.

Python helps with flexibility of the server, compared to your libwebsocket, you can use logging, store results in a database, etc."

See this thread:

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kaldi-help/F5cA1Rfcce8/cD9DMyG5AwAJ

hanv89 commented 5 years ago

That's great. Thank you @yondu22, Thank you @alumae!

nshmyrev commented 5 years ago

Thank you @alumae

YunzhaoLu commented 4 years ago

I used master_server.py(kaldi-gstreamer-server) to start kaldi_recognizer(kaldi-websocket-python) to initiate multiple websocket connections, however all websocket connections with kaldi_recognizer were allocated to one CPU. Any advice is appreciated! Luke

nshmyrev commented 4 years ago

@YunzhaoLu put many workers behind nginx websocket proxy.

sirifarif commented 3 years ago

@nshmyrev How you done it or can point to an example code. many thanks

shatealaboxiaowang commented 3 years ago

@alumae Thank you for your advice, but "https://github.com/alphacep/kaldi-websocket-python" is offline speech recognition and not online real-time, so have other advice with implementation of multithread processing of many streams with shared model data and online real-time ?? thanks

alumae commented 3 years ago

Use https://github.com/alphacep/vosk-api.

dpny518 commented 3 years ago

https://github.com/alphacep/vosk-server

alumae / kaldi-gstreamer-server

100k+ concurrent connections #154