Partial results on the gpu batch recognizer

starfurylab commented 3 months ago

Hi! Great project, especially excited about the gpu support. But i have a question, is it possible to use something like PartialResult() when working on gpu (rtx2080ti, cuda12.3), as it is done in websocket/asr_server.py? For example, in a real-time audio stream analysis scenario, which is perfectly handled by asr server running on the cpu, but would like more performance than a cpu can provide. Best Regards.

nshmyrev commented 3 months ago

Hello. It is possible but not implemented

starfurylab commented 3 months ago

Thanks! it's good to know that it's possible in principle. Can you give me a hint? Will such implementation affect only vosk-api or kaldi too? And maybe give me a direction to look in? I want to try to implement this feature.

nshmyrev commented 3 months ago

Sure, see here:

https://github.com/kaldi-asr/kaldi/blob/master/src/cudadecoder/batched-threaded-nnet3-cuda-online-pipeline.h#L172

https://github.com/alphacep/vosk-api/blob/master/src/batch_recognizer.cc#L120

starfurylab commented 3 months ago

Thanks, I'll let you know when I get something

starfurylab commented 2 months ago

Hi. Sorry for the delay. I have created a pull-request: https://github.com/alphacep/vosk-api/pull/1554

I added partial results, but I don't know how to link it to other languages, so only in c, and added an example

On tests I got a limit of about 510-530 realtime streams from several test files on the rtx2080ti at about 15-20% of the i7-8700

Problems I noticed: it crashes when removing the model when removing the cuda pipeline instance, but I didn't look deeply into kaldi https://github.com/alphacep/vosk-api/blob/40937b6bcbe318eeb01879093c59cf5a1219a29d/src/batch_model.cc#L128

ASSERTION_FAILED ([5.5.1094~1-2b69ae]:~BatchedThreadedNnet3CudaOnlinePipeline():batched-threaded-nnet3-cuda-online-pipeline.cc:60) Assertion failed: (available_channels_.empty() || available_channels_.size() == num_channels_)

[ Stack-Trace: ]
../src/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7f6) [0x7bf5e0db5076]
../src/libvosk.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x75) [0x7bf5e0db5ae5]
../src/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::~BatchedThreadedNnet3CudaOnlinePipeline()+0xb1e) [0x7bf5e0951f3e]
../src/libvosk.so(BatchModel::~BatchModel()+0x1d3) [0x7bf5e094af63]
../src/libvosk.so(vosk_batch_model_free+0x12) [0x7bf5e0917842]
./test_vosk_gpu_batch(+0x150b) [0x65475b2dd50b]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7bf5e0229d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7bf5e0229e40]
./test_vosk_gpu_batch(+0x12a5) [0x65475b2dd2a5]

Aborted (core dumped)

alphacep / vosk-api

Partial results on the gpu batch recognizer #1539