alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.99k stars 1.11k forks source link

Lookahead script for vosk-model-en-us-0.21-compile #764

Closed Lightning101 closed 2 years ago

Lightning101 commented 2 years ago

Hi I was looking to do adaptations with new words and grammar applied through the use of Gr.fst as mentioned here #55.

As previously suggested in other issues I looked through the LibriSpeech scripts in Kaldi. Ended up with this for creating of Lookahead graph for vosk-model-en-us-0.21-compile.

I was wondering how correct this is or if there is a better way?

#!/bin/bash

. path.sh

set -x

# Example script for lookahead composition

lm=en-mix-small
am=exp/chain/tdnn
testset=test_tedlium

if [ ! -f "${KALDI_ROOT}/tools/openfst/lib/libfstlookahead.so" ]; then
    echo "Missing ${KALDI_ROOT}/tools/openfst/lib/libfstlookahead.so"
    echo "Make sure you compiled openfst with lookahead support. Run make in ${KALDI_ROOT}/tools after git pull."
    exit 1
fi
if [ ! -f "${KALDI_ROOT}/tools/openfst/bin/ngramread" ]; then
    echo "You appear to not have OpenGRM tools installed. Missing ${KALDI_ROOT}/tools/openfst/bin/ngramread"
    echo "cd to $KALDI_ROOT/tools and run extras/install_opengrm.sh."
    exit 1
fi

rm -rf data/*.lm.gz data/lang_local data/dict data/lang data/lang_test data/lang_test_rescore
rm -rf exp/lgraph
rm -rf exp/graph

mkdir -p data/dict
cp db/phone/* data/dict
./dict.py > data/dict/lexicon.txt

ngram-count -wbdiscount -order 4 -text db/extra.txt -lm data/extra.lm.gz
ngram -order 4 -lm db/en-230k-0.5.lm.gz -mix-lm data/extra.lm.gz -lambda 0.95 -write-lm data/en-mix.lm.gz
ngram -order 4 -lm data/en-mix.lm.gz -prune 3e-8 -write-lm data/en-mixp.lm.gz
ngram -lm data/en-mixp.lm.gz -write-lm data/en-mix-small.lm.gz

utils/prepare_lang.sh data/dict "[unk]" data/lang_local data/lang

# Baseline
utils/format_lm.sh data/lang data/${lm}.lm.gz \
    data/dict/lexicon.txt data/lang_test

utils/mkgraph.sh --self-loop-scale 1.0 --remove-oov \
    data/lang_test ${am} ${am}/graph

steps/nnet3/decode.sh --nj 20 \
    --acwt 1.0 --post-decode-acwt 10.0 \
    --online-ivector-dir exp/chain/ivectors_${testset} \
    ${am}/graph data/${testset} ${am}/decode_${testset}_lookahead_base

utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --remove-oov --compose-graph \
    data/lang_test ${am} ${am}/graph_${lm}_lookahead

# Decode with statically composed lookahead graph
steps/nnet3/decode.sh --nj 20 \
    --acwt 1.0 --post-decode-acwt 10.0 \
    --online-ivector-dir exp/chain/ivectors_${testset} \
    ${am}/graph_${lm}_lookahead data/${testset} ${am}/decode_${testset}_lookahead_static

# Decode with runtime composition
steps/nnet3/decode_lookahead.sh --nj 20 \
    --acwt 1.0 --post-decode-acwt 10.0 \
    --online-ivector-dir exp/chain/ivectors_${testset} \
    ${am}/graph_${lm}_lookahead data/${testset} ${am}/decode_${testset}_lookahead

# Compile arpa graph
utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --compose-graph \
    data/lang_test ${am} data/${lm}.lm.gz ${am}/graph_${lm}_lookahead_arpa

# Decode with runtime composition
steps/nnet3/decode_lookahead.sh --nj 20 \
    --acwt 1.0 --post-decode-acwt 10.0 \
    --online-ivector-dir exp/chain/ivectors_${testset} \
    ${am}/graph_${lm}_lookahead_arpa data/${testset} ${am}/decode_${testset}_lookahead_arpa

# Decode with runtime composition and tuned beams
steps/nnet3/decode_lookahead.sh --nj 20 \
    --beam 12.0 --max-active 3000 \
    --acwt 1.0 --post-decode-acwt 10.0 \
    --online-ivector-dir exp/chain/ivectors_${testset} \
    ${am}/graph_${lm}_lookahead_arpa data/${testset} ${am}/decode_${testset}_lookahead_arpa_fast

I have also included the decoding results: decode_test_tedlium_lookahead.zip

nshmyrev commented 2 years ago

Its ok

Lightning101 commented 2 years ago

Hi @nshmyrev awesome.

Also thank you guys for your work at VOSK. I would have never have dreamed of using ASR if not for your work. You guys have made a field that was previously inaccessible to most people now much easier to use. Look forward to working more with you guys.

nshmyrev commented 2 years ago

You are welcome Sean @Lightning101, let us know if we can help somehow.

venusfire commented 2 years ago

hi, Nshmyrev,

I am doing the same thing: adaptation on the vosk-model-en-us-0.21-compile. and generated the lookahead LM Gr.fst and HLCr.fst. using the script above, and the test results are good.

But my task is just recoginze digit "one two three four" only.

When I do recognition with HLCG and lookahead version LM with full vocabulary, the results are both like : 112.wav what went to,
123.wav one two three 124.wav one to for. There are mistaks, but make sense.

So I tried the test_words.py to limit. where I use rec = KaldiRecognizer(model, wf.getframerate(), '[" one two three four ", "[unk]"]') but results get totally wrong, like -- 112.wav two. 123.wav - two four, 124.wav two three.

I can do this successfully with vosk-model-en-us-0.15 where the Gr.fst and HLCr.fst are included, but never with my version of the look ahead LM. -- the are fine for a full vocabulary recogintion, but didn't go along with limited words recognition using test_words.py.

Could you give hints how to make look ahead LM that work with test_words.py, ie. KaldiRecognizer(model, wf.getframerate(), '[" one two three four ", "[unk]"]') ?

venusfire commented 2 years ago

I guess my question is : is the lookahead LM in vosk-model-en-us-0.15 genenrated the same wave as above script using mkgraph_lookahead.sh ? or there is somethiing speictial to make it work with test_words.py. thanks!

nshmyrev commented 2 years ago

is the lookahead LM in vosk-model-en-us-0.15 genenrated the same wave as above script using mkgraph_lookahead.sh ?

Yes

or there is somethiing speictial to make it work with test_words.py.

You probably didn't replace/delete some files. Like you don't need rescore folder.

venusfire commented 2 years ago

we don't need to resscore. -- I will do this next : copy the Gr.fst and HLCr.fst to the working vosk-0.15 folver and replace the two files. There should not be any interfering files there.

What other possibilities you can suggest? When we generate Gr.fst/HJLCr.fst with even the model we just trained, we get similar problems --- recognition with HLCG or the lookahead LM with full vocab is good, but simple digit recogntion is bad.

nshmyrev commented 2 years ago

I have created http://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip for you, you are welcome to try

venusfire commented 2 years ago

Thanks man! will try it out.

On the other hand, we still don't know what's wrong with our way of making lookahead LM with limited grammar ASR. - we don't have rescore dir at all. and all the files for a model seems simple and all needed.

The lookahead LM in your zip above is just made by utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --remove-oov --compose-graph data/langtest ${am} ${am}/graph${lm}_lookahead, right?

Thanks !

venusfire commented 2 years ago

Sorry --

  1. this model in your zip file above works fine with the "four digits recogition" task . same as the Vosk-model-en-us-0.15 and vosk-model-en-us-dannzu-20200905-lgraph..
  2. Similaly, they don't have lexixon.txt, lm.arpa, or the lang dir (G.fst... L.fst ... words.txt....) needed to 1) build the lookahead LM. and 2) do AM adaptation. and --
  3. We still don't know how to make a good lookahead LM for the digit recognition task , by ourselves. It looks sooo simple, just one line command, but just didn't give good results. make me crazy.

Or make it simpler, could you start with Vosk-model-en-us-0.21-compile for this thread , and make and provide the lookahead LM for the Vosk-model-en-us-0.21-compile ? and the script, so that we can figure out what's wrong in our lookahead LM generation? (this model has all things needed to build lookahead LM by ourselvs ).

-- it looks so simple with the script posted above, but just don't work for me for limited vocab recognition. But all lookahead LM you provided works with limited vocab ASR as test_words.py.

Again. the lookahead LM we made works fine for full vocab ASR, but doesn't work with limited grammar ASR as test_words.py.

Thanks!

venusfire commented 2 years ago

This is the script we use and the gragh dir with lookagread LM we generated with vosk-model-en-us-0.21-compile model. again, good for full vocab ASR, bad for "four digit ASR" with limited grammar and test_words.py. (rec = KaldiRecognizer(model, wf.getframerate(), '[" one two three four ", "[unk]"]') )

At the same time, the model you provided above (0.22) works all fine.

The gragh: (graph_en-mix-small_lookahead dir ) https://drive.google.com/file/d/1SRPH0e9EJj3Qxz8nTW_LzyLk2_D5EN3s/view?usp=sharing

the script:

!/bin/bash

. path.sh

set -x export KALDI_ROOT=/opt/kaldi export PATH=$KALDI_ROOT/tools/srilm-1.7.2/bin/i686-m64:$PATH export LD_LIBRARY_PATH=$KALDI_ROOT/tools/openfst/lib/fst

Example script for lookahead composition

lm=en-mix-small am=exp/chain/tdnn testset=test_tedlium

if [ ! -f "${KALDI_ROOT}/tools/openfst/lib/libfstlookahead.so" ]; then echo "Missing ${KALDI_ROOT}/tools/openfst/lib/libfstlookahead.so" echo "Make sure you compiled openfst with lookahead support. Run make in ${KALDI_ROOT}/tools after git pull." exit 1 fi if [ ! -f "${KALDI_ROOT}/tools/openfst/bin/ngramread" ]; then echo "You appear to not have OpenGRM tools installed. Missing ${KALDI_ROOT}/tools/openfst/bin/ngramread" echo "cd to $KALDI_ROOT/tools and run extras/install_opengrm.sh." exit 1 fi

rm -rf data/*.lm.gz data/lang_local data/dict data/lang data/lang_test data/lang_test_rescore rm -rf exp/lgraph rm -rf exp/graph

mkdir -p data/dict cp db/phone/* data/dict python3.7 dict.py > data/dict/lexicon.txt

ngram-count -wbdiscount -order 4 -text db/extra.txt -lm data/extra.lm.gz ngram -order 4 -lm db/en-230k-0.5.lm.gz -mix-lm data/extra.lm.gz -lambda 0.95 -write-lm data/en-mix.lm.gz ngram -order 4 -lm data/en-mix.lm.gz -prune 3e-8 -write-lm data/en-mixp.lm.gz ngram -lm data/en-mixp.lm.gz -write-lm data/en-mix-small.lm.gz

utils/prepare_lang.sh data/dict "[unk]" data/lang_local data/lang

Baseline

utils/format_lm.sh data/lang data/${lm}.lm.gz \ data/dict/lexicon.txt data/lang_test

utils/mkgraph.sh --self-loop-scale 1.0 --remove-oov \ data/lang_test ${am} ${am}/graph

utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --remove-oov --compose-graph \ data/langtest ${am} ${am}/graph${lm}_lookahead

exit

.....so what could be wrong?

venusfire commented 2 years ago

Now working -- I post the solution in case anyone needed ( took us nearly week, tried compile several kaldi's and openfst's versioin ) . The answer is -- use the arapa lookhaead , not lookahead LM for limited vocab ASR !

Lightning101 commented 2 years ago

@venusfire Im sorry when I posted the original script that I did not include some of the more fine details. @nshmyrev had mentioned that VOSK was meant to be used with the arapa version in a previous issue( if I find it I will link). Next time I will try and include all related issues for reference.

--frame-subsampling-factor=3 
--minimize=false 
--max-active=7000 
--min-active=200 
--beam=15.0 
--lattice-beam=8.0 
--acoustic-scale=1.0
--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10
--endpoint.rule2.min-trailing-silence=0.5
--endpoint.rule3.min-trailing-silence=1.0
--endpoint.rule4.min-trailing-silence=2.0

Here are the docs for it REF they can also help in making accuracy better depending on your use case.

These are some of the things I have found. It might not be correct. Im still new to ASR :).

Also @venusfire could you shed some light on the test_words.py I still have no idea how it should work and the cpp and kaldi dont have much of a description on it. Thanks

venusfire commented 2 years ago

thanks for the details!~ test_words.py is very very straightfrd and nothing much to change, (same as making lookahead LM , as I first thought). So what is your questions?

I only change : rec = KaldiRecognizer(model, wf.getframerate(), '[" one two three four ", "[unk]"]') and the test files : 124.wav is just as the name suggests. it gives results as : one two fuor. otherwise the full vocab ASR will give : when to four

oh. I run it with python 3.7

Lightning101 commented 2 years ago

Thanks @venusfire I kinda had the idea it allows for Updating recognizer vocabulary in runtime since I saw the code linked in the VOSK adaptation. image