alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

What is the gain of the GPU usage? #1341

Closed mason-acronode closed 1 year ago

mason-acronode commented 1 year ago

Hello, Can anyone please tell me what's the gain if I use the GPU inference other than CPU inference? Currently, when I use the CPU inference, the concurrent recognition depends on the cpu core size ideally. So if I have 8 cores, I can use 8 concurrent recognition at the same time with 100% cpu usage in theory. If I use the GPU resource, can I get more higher concurrent capability or just can expect to reduce the real-time factor or expect to reduce the CPU usage only? If it can help to enhance the concurrent recognition capability, how many concurrency can be expected? Is there any base on the calculated table something like that?

nshmyrev commented 1 year ago

I get more higher concurrent capability

Yes, up to 150 parallel streams on GTX 3080

or just can expect to reduce the real-time factor or expect to reduce the CPU usage only?

RTF is about the same

mason-acronode commented 1 year ago

awesome. so in my case, I can perform 8*150 recognition request concurrently without degrading the RTF. Am I right?

mason-acronode commented 1 year ago

If the request was sent to server in random (eg. sometimes 20 request occurs in a second or 1 request only in a second), batch recognition looks not applicable for this scenario because when I looked at the source code, inference was pending until the all of the processing was done. so the gpu inference will be only applicable to the fixed concurrent request scenario?

nshmyrev commented 1 year ago

I can perform 8*150 recognition request concurrently without degrading the RTF

Just 150, not 8 * 150

batch recognition looks not applicable

No

inference was pending until the all of the processing was done.

No, it runs periodically

mason-acronode commented 1 year ago

Thank for replying. Strangely, if I send the multiple request at the same time from the different thread, it made an error. if I use 1 thread, it's okay because it only runs 1 recognition in order. I will capture the screen shot what's an error. Do you think if I missed something? I just put the GpuInit() in main thread once but I'm not putting GpuThreadInit() in each thread. It make some problem in multi-thread for concurrent processing?

  1. Send Request 1 (thread 1)--------------------- processing ------------------------ end processing
  2. Send Request 2 (th 2)--------------------------------------------------------------- end processing
  3. SR 3 (th 3) --------------------------end processing
  4. ............
mason-acronode commented 1 year ago

image

nshmyrev commented 1 year ago

There is BatchModel/BatchRecognizer for GPU

mason-acronode commented 1 year ago

Yes, I already compiled VoskAPI with GPU and created the c# nuget package as well. image

mason-acronode commented 1 year ago

string strModelPath = $"{model}{Path.DirectorySeparatorChar}{lang}"; var sttEnModel = new BatchModel(strModelPath);

using (rec = new VoskBatchRecognizer(sttModel, SampleRate)) {}

mason-acronode commented 1 year ago

Figured out but still working on the performance issue. This issue can be closed.