Closed mason-acronode closed 1 year ago
I get more higher concurrent capability
Yes, up to 150 parallel streams on GTX 3080
or just can expect to reduce the real-time factor or expect to reduce the CPU usage only?
RTF is about the same
awesome. so in my case, I can perform 8*150 recognition request concurrently without degrading the RTF. Am I right?
If the request was sent to server in random (eg. sometimes 20 request occurs in a second or 1 request only in a second), batch recognition looks not applicable for this scenario because when I looked at the source code, inference was pending until the all of the processing was done. so the gpu inference will be only applicable to the fixed concurrent request scenario?
I can perform 8*150 recognition request concurrently without degrading the RTF
Just 150, not 8 * 150
batch recognition looks not applicable
No
inference was pending until the all of the processing was done.
No, it runs periodically
Thank for replying. Strangely, if I send the multiple request at the same time from the different thread, it made an error. if I use 1 thread, it's okay because it only runs 1 recognition in order. I will capture the screen shot what's an error. Do you think if I missed something? I just put the GpuInit() in main thread once but I'm not putting GpuThreadInit() in each thread. It make some problem in multi-thread for concurrent processing?
There is BatchModel/BatchRecognizer for GPU
Yes, I already compiled VoskAPI with GPU and created the c# nuget package as well.
string strModelPath = $"{model}{Path.DirectorySeparatorChar}{lang}"; var sttEnModel = new BatchModel(strModelPath);
using (rec = new VoskBatchRecognizer(sttModel, SampleRate)) {}
Figured out but still working on the performance issue. This issue can be closed.
Hello, Can anyone please tell me what's the gain if I use the GPU inference other than CPU inference? Currently, when I use the CPU inference, the concurrent recognition depends on the cpu core size ideally. So if I have 8 cores, I can use 8 concurrent recognition at the same time with 100% cpu usage in theory. If I use the GPU resource, can I get more higher concurrent capability or just can expect to reduce the real-time factor or expect to reduce the CPU usage only? If it can help to enhance the concurrent recognition capability, how many concurrency can be expected? Is there any base on the calculated table something like that?