Open towfiqi opened 5 years ago
Few years ago, when I tested this on usual EC2 instance, it was 1 core can manage one concurrent speech recognition in realtime. So 10 cores for 10 concurrent speech recognition could be your target.
But this is model dependent. There could be slower or faster models.
Thanks a lot. Can nivida gpu with cuda cores can handle more concurrent processes? Interesting Update
As far as I know GPU used exclusively for training.
That was my understanding too. But I am confused how they used GPU for interference and achieved 9.2x preformance result:
"One experiment with clean data achieved speech-to-text inferencing 3,524x faster than real-time processing using an NVIDIA Tesla V100."
You should address this question to kaldi project developers. This server was developed a while ago and not really in sync with latest kaldi developments.
Thank You. According to the post mentioned below, usually 1 cpu(worker) can only serve 1 connection at a time. But if the server shares the same decoding graph for every connection, it can serve 10 concurrent connection per cpu.
Read the last comment of this thread
Do you have any idea if this asr-server shares the decoding graph? Or If its possible to implement this into it?
Thanks
ASR server keep one model in memory and serves it. It work as HTTP based wrapper around kaldi decoder. You can use it as codebase for your solution, but you likely want to modify it to be used with more modern versions of kaldi decoders.
Thank you for all your help :)
Hi,
What type of server config would it need to process/decode 10 concurrent speech recognition? How many cores and ram? Not training only for decoding.
Thanks