Reduce latency when using Rnnlm rescoring

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.7k stars 1.08k forks source link

Reduce latency when using Rnnlm rescoring #1272

Open tienanh28122000 opened 1 year ago

tienanh28122000 commented 1 year ago

Dear all, I have some questions about RNNLM rescoring. I have compared the latency of 3 type of decoding (1-pass, carpa rescoring and rnnlm rescoring) and found that the latency of Rnnlm rescoring is really high than the others. I've tried to modify the source code of Vosk in recognizer.cc file to improve the rnnlm rescoring but getting no better result.

Therefore, is there any way to reduce the latency when using Rnnlm rescoring? If yes, can you list some directions for me to follow. I think we can modify the source code of Vosk to reduce the latency when using Rnnlm. The accuracy is not very important for me, just how to reduce the latency when using rnnlm rescoring.

Best regards

nshmyrev commented 1 year ago

Train smaller rnnlm model, it will work faster. If accuracy is not important for you just remove rnnlm from the model.

tienanh28122000 commented 1 year ago

Thank you very much for replying. Btw, I've found that if we reduce the lattice-compose-beam (default: 6), it will make rescoring faster. But when I reduce the beam, the RTF seems not reduce very much. The ref is here: https://groups.google.com/g/kaldi-help/c/-G0-fmsJFtw/m/fNLIvUNuAAAJ. Therefore, is there a bug with lattice-compose-beam in Vosk or something?

nshmyrev commented 1 year ago

RTF seems not reduce very much

This is because lattice rescoring is relatively fast I think, most of the RTF is in first pass decoding. Not a bug I think.

If you count pure rescoring (rnnlm/lmrescore_pruned.sh script), results will be like this:

beam 4 time for 1000 utts 132 seconds WER 4.91% beam 1 time for 1000 utts 61 seconds WER 5.54%

tienanh28122000 commented 1 year ago

I've counted the pure rnnlm rescoring in Vosk by using std::chrono to measure the time that function _ComposeCompactLatticePruned(compose_opts, tmp_clat, &combined_rnnlm, &bestclat) execute with different beam size. And this is the result that I get: (avg of 4000 utts)

beam = 1 time = 0.45 WER = 4.8%
beam = 2 time = 0.49 WER = 4.7%
beam = 3 time = 0.51 WER = 4.5%
beam = 4 time = 0.50 WER = 4.5%
beam = 6 time = 0.53 WER = 4.4% The lattice beam is not effective and the latency is not greatly impacted like I think. Is it ok?

nshmyrev commented 1 year ago

Beam is working since WER changes. As for time, let me investigate.