alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.22k stars 1.13k forks source link

[clarification] The offline kaldi decode with MBR method confidence and the VOSK output confidence score is different.. #650

Open arunbaby0 opened 3 years ago

arunbaby0 commented 3 years ago

We have tried the following sequence of commands to get the MBR decode using Kaldi offline(not the Kaldi fork from alphacep).

Method 1:

lattice-to-ctm-conf  --inv-acoustic-scale=12  --decode-mbr=true  "ark:gunzip -c exp/MF/model/chain/tdnn_1g_aug/MF/lat.1.gz|" - | utils/int2sym.pl -f 5  data_hindi_rnnlm_combined_text/lang/words.txt > 1.ctm

Method 2:

lattice-align-words data_hindi_rnnlm_MF/lang/phones/word_boundary.int exp/MF/model/chain/tdnn_1g_aug/final.mdl "ark:gunzip -c exp/MF/model/chain/tdnn_1g_aug/MF/lat.1.gz|" ark:- | lattice-to-ctm-conf ark:- - | utils/int2sym.pl -f 5 data_hindi_rnnlm_combined_text/lang/words.txt > 1.ctm

In both cases, the confidence score obtained from VOSK and Kaldi is different. Is VOSK doing any extra operations on top of the normal Kaldi output?

nshmyrev commented 3 years ago

It might be about acoustic scale. Hard to guess, you need to compare lattices step by step. We do lattice-align-words first.