Open v-yunbin opened 1 year ago
--min-active=200 --max-active=7000 --beam=12.0 --lattice-beam=6.0 --acoustic-scale=1.0 --frame-subsampling-factor=3 --endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10 --endpoint.rule2.min-trailing-silence=0.5point.rule3.min-trailing-silence=1.0 --endpoint.rule4.min-trailing-silence=2.0
online2-wav-nnet3-latgen-faster --do-endpointing=false --online=false --feature-type=mfcc --mfcc-config=mfcc.conf --ivector-extraction-config=ivector.conf --beam=12.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=/asr-model/graph/words.txt /asr-model/am/final.mdl /asr-model/graph/HCLG.fst 'ark:echo utter1 utter1|' 'scp:echo utter1 test.wav|' ark:/dev/null
The above test based on same asr model、 same test wav and same model set(beam、lattice-beam、acoustic-scale) , but get different result. wav groundtruth:"再等两秒" vosk result: "那 不要不要" kaldi result: "在 莆田 两秒"
Dear, so many issues could there be ;) I need to think.
online2-wav-nnet3-latgen-faster --do-endpointing=false --online=false --feature-type=mfcc --mfcc-config=mfcc.conf --ivector-extraction-config=ivector.conf --beam=12.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=/asr-model/graph/words.txt /asr-model/am/final.mdl /asr-model/graph/HCLG.fst 'ark:echo utter1 utter1|' 'scp:echo utter1 test.wav|' ark:/dev/null
The above test based on same asr model、 same test wav and same model set(beam、lattice-beam、acoustic-scale) , but get different result. wav groundtruth:"再等两秒" vosk result: "那 不要不要" kaldi result: "在 莆田 两秒"