alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.43k stars 1.04k forks source link

vosk decode vs kaldi decode #1194

Open v-yunbin opened 1 year ago

v-yunbin commented 1 year ago
  1. vosk decode: model config set:
    --min-active=200
    --max-active=7000
    --beam=12.0
    --lattice-beam=6.0
    --acoustic-scale=1.0
    --frame-subsampling-factor=3
    --endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10
    --endpoint.rule2.min-trailing-silence=0.5point.rule3.min-trailing-silence=1.0
    --endpoint.rule4.min-trailing-silence=2.0
  2. kaldi decode: online2-wav-nnet3-latgen-faster --do-endpointing=false --online=false --feature-type=mfcc --mfcc-config=mfcc.conf --ivector-extraction-config=ivector.conf --beam=12.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=/asr-model/graph/words.txt /asr-model/am/final.mdl /asr-model/graph/HCLG.fst 'ark:echo utter1 utter1|' 'scp:echo utter1 test.wav|' ark:/dev/null

The above test based on same asr model、 same test wav and same model set(beam、lattice-beam、acoustic-scale) , but get different result. wav groundtruth:"再等两秒" vosk result: "那 不要不要" kaldi result: "在 莆田 两秒"

nshmyrev commented 1 year ago

Dear, so many issues could there be ;) I need to think.