alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.38k stars 1.04k forks source link

How to do VOSK performance improvements using Gpu ? #1415

Open nishanth-cn opened 11 months ago

nishanth-cn commented 11 months ago

Hi

I am trying to use gpu to get good performance. Using vosk-model-en-us-0.22-lgraph , we could only manage to run 8 recognizers before reaching 100% cpu. However my requirement is to run atleast 20 to 40 recognizers. My VM is CentOS (VMware) with 16GB ram, 8 cores and VMware SVGA II Adapter. Currently my hardware improvements is not possible, so I am trying find the other routes to get best performance.

I have tried a brute force method of adding delaying after sending "N" audio packets. This method reduced cpu usage and helped to run 10 recognizers. But this is not sufficient.

I read an article about usage of gpu in vosk https://hub.docker.com/r/alphacep/kaldi-en-gpu. But I got below error.

dockerrun --gpus all -p 2700:2700 alphacep/kaldi-en-gpu docker: Error response from daemon: could not select device driver "" with capabilities:[[gpu]].

Also I am not finding a proper document which can help me read more about it. So please help me with more details of vosk gpu , requirements and limitations.

nshmyrev commented 11 months ago

I doubt you can reliably use docker inside vmware. I think you'd better install barebone Linux and you need nvidia card.

It is possible to run smaller model in 40 threads on your cpu.

nishanth-cn commented 11 months ago

Currently I am not using docker, I am using vosk's java/jni wrapper jar. I think batch recognizers are not part of jar :(. As my application is in java, this is another roadblock i face.