Dividing ASR processing pipeline

I'm wondering if would be possible to divide ASR processing pipeline. I see that the VOSK uses the kaldi::SingleUtteranceNnet3Decoder, which performs the neural network computing and the FST computing. There is also needed some CPU effort for feature and ivectors extraction. They are all processed by one core. I suppose that the AM and LM computations are most laborious. For small systems, like raspberry, it would be nice to be able to do this computation on separate cores. Is there a kaldi decoder that allows such operations in separate threads? Or maybe it would be relatively easy to modify an existing one? I do not have much knowledge of kaldi decoders, so I'm asking more experienced ones.

BR, Rafał

alphacep / vosk-api

Dividing ASR processing pipeline #783