alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.85k stars 1.09k forks source link

Graph compilation - Error missing file #1148

Open makdatascientist opened 2 years ago

makdatascientist commented 2 years ago

Hi, I have downloaded the model compile file from link https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-compile.zip and while running the compile-graph.sh to compile graph in kaldi, I got a following error like below

root@MAK:~/kaldi/tools/model/vosk-model-en-us-0.22-compile# compile-graph.sh
+ rm -rf 'data/*.lm.gz' data/lang_local data/dict data/lang data/lang_test data/lang_test_rescore
+ rm -rf exp/lgraph
+ rm -rf exp/graph
+ mkdir -p data/dict
+ cp db/phone/extra_questions.txt db/phone/extra_questions.txt:Zone.Identifier db/phone/nonsilence_phones.txt db/phone/nonsilence_phones.txt:Zone.Identifier db/phone/optional_silence.txt db/phone/optional_silence.txt:Zone.Identifier db/phone/silence_phones.txt db/phone/silence_phones.txt:Zone.Identifier data/dict
+ python3 ./dict.py
+ ngram-count -wbdiscount -order 4 -text db/extra.txt -lm data/extra.lm.gz
/root/kaldi/tools/model/vosk-model-en-us-0.22-compile/compile-graph.sh: line 15: ngram-count: command not found
+ ngram -order 4 -lm db/en-230k-0.5.lm.gz -mix-lm data/extra.lm.gz -lambda 0.95 -write-lm data/en-mix.lm.gz
/root/kaldi/tools/model/vosk-model-en-us-0.22-compile/compile-graph.sh: line 16: ngram: command not found
+ ngram -order 4 -lm data/en-mix.lm.gz -prune 3e-8 -write-lm data/en-mixp.lm.gz
/root/kaldi/tools/model/vosk-model-en-us-0.22-compile/compile-graph.sh: line 17: ngram: command not found
+ ngram -lm data/en-mixp.lm.gz -write-lm data/en-mix-small.lm.gz
/root/kaldi/tools/model/vosk-model-en-us-0.22-compile/compile-graph.sh: line 18: ngram: command not found
+ utils/prepare_lang.sh data/dict '[unk]' data/lang_local data/lang
utils/prepare_lang.sh data/dict [unk] data/lang_local data/lang
Checking data/dict/silence_phones.txt ...
--> reading data/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/dict/silence_phones.txt is OK

Checking data/dict/optional_silence.txt ...
--> reading data/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/dict/optional_silence.txt is OK

Checking data/dict/nonsilence_phones.txt ...
--> reading data/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/dict/lexicon.txt
--> reading data/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/dict/lexicon.txt is OK

Checking data/dict/extra_questions.txt ...
--> reading data/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/dict]

**Creating data/dict/lexiconp.txt from data/dict/lexicon.txt
utils/prepare_lang.sh: line 547: fstaddselfloops: command not found
ERROR: FstHeader::Read: Bad FST header: standard input
+ utils/format_lm.sh data/lang data/en-mix-small.lm.gz data/dict/lexicon.txt data/lang_test
Converting 'data/en-mix-small.lm.gz' to FST
gzip: data/en-mix-small.lm.gz: No such file or directory
utils/format_lm.sh: line 55: arpa2fst: command not found
+ utils/mkgraph.sh --self-loop-scale 1.0 data/lang_test exp/chain/tdnn exp/chain/tdnn/graph
mkgraph.sh: expected data/lang_test/G.fst to exist
+ utils/build_const_arpa_lm.sh data/en-mix.lm.gz data/lang_test data/lang_test_rescore
utils/build_const_arpa_lm.sh: line 45: arpa-to-const-arpa: command not found
+ rnnlm/change_vocab.sh data/lang/words.txt exp/rnnlm exp/rnnlm_out
rnnlm/change_vocab.sh: Copying config directory.
rnnlm/change_vocab.sh: Re-generating words.txt, unigram_probs.txt, word_feats.txt and word_embedding.final.mat.
rnnlm/get_word_features.py: made features for 312336 words.
rnnlm/change_vocab.sh: line 75: rnnlm-get-word-embedding: command not found
+ utils/mkgraph_lookahead.sh --self-loop-scale 1.0 data/lang exp/chain/tdnn data/en-mix-small.lm.gz exp/chain/tdnn/lgraph
utils/mkgraph_lookahead.sh : compiling grammar data/en-mix-small.lm.gz
utils/mkgraph_lookahead.sh : expected data/en-mix-small.lm.gz to exist

kindly suggest where to find en-mix-small.lm.gz file.

nshmyrev commented 2 years ago

It says you miss srilm ngram tool in PATH

nshmyrev commented 2 years ago

and other kaldi binaries too, probably kaldi is not compiled

makdatascientist commented 2 years ago

Thanks a lot for your reply, really appreciated. For getting model compilation package only the below step are enough? kindly reply.

Graph compilation
For performance all the models are compiled into more compact structures - FST graphs. If you want to modify them - add new words or adapt to a domain, you run several steps of graph compilation.

Not every Vosk model allows vocabulary modification of the graph. Some like US English, big Russian or German include all necessary files (“tree” file from the model which contains information about phoneme context dependency). Some don’t have required files, you need to contact Alphacephei to get access to them.

Hardware
Compilation is not very slow, but still requires significant hardware - a Linux server with 32Gb RAM at least and 100Gb of disk space. It is unlikely you can compile a big model in a virtual machine. Small models require less data.

Software
The following software must be pre-installed on a server:

Kaldi
SRILM
Phonetisaurus (with pip3 install phonetisaurus)
In the future we might provide a docker for model compilation, for now you have to compile it yourself.

Update process
Download the update package, for example:

Russian - https://alphacephei.com/vosk/models/vosk-model-ru-0.22-compile.zip

US English - https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-compile.zip

German - https://alphacephei.com/vosk/models/vosk-model-de-0.21-compile.zip

French - https://alphacephei.com/vosk/models/vosk-model-fr-0.6-linto-2.2.0-compile.zip

Other language packs are available on request. Please contact us at [contact@alphacephei.com](mailto:contact@alphacephei.com)

Unpack and properly point to KALDI_ROOT in the path.sh script
Add your extra texts into db/extra.txt
Optionally add manual words phones into db/extra.dic
Run compile-graph.sh. Update takes about 15 minutes. Watch errors in the process.
Run decode.sh to test decoding works successfully. Watch the WER in the decoding folder.
Optionally, check that the g2p properly predicted the phonemes in the end of data/dict/lexicon.txt. If needed, update g2p model with new words.

Outputs
Depending on your needs you might pick some result files from the compilation folder. Remember, that if you changed the graph you also need to change the rescoring/rnnlm part, otherwise they will go out of sync and accuracy will be low.

For large model pick the following parts:

exp/chain/tdnn/graph
data/lang_test_rescore/G.fst and data/lang_test_rescore/G.carpa into rescore folder
exp/rnnlm_out into rnnlm folder, you can delete some unnecessary files from rnnlm too.
If you don’t want to use RNNLM, delete rnnlm folder from the model.

If you don’t want to use rescoring, delete the rescore folder from the model, that will save you some runtime memory, but accuracy will be lower.

For small model, just pick the required files from exp/chain/tdnn/lgraph.
erdoganensar commented 1 year ago

Hello, When I run ./compile-graph.sh it ended as below, is this normal?

./compile-graph.sh: line 15: ngram-count: command not found ./compile-graph.sh: line 16: ngram: command not found utils/prepare_lang.sh data/dict [unk] data/lang_local data/lang Checking data/dict/silence_phones.txt ... --> reading data/dict/silence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/dict/silence_phones.txt is OK

Checking data/dict/optional_silence.txt ... --> reading data/dict/optional_silence.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/dict/optional_silence.txt is OK

Checking data/dict/nonsilence_phones.txt ... --> reading data/dict/nonsilence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt --> disjoint property is OK.

Checking data/dict/lexicon.txt --> reading data/dict/lexicon.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/dict/lexicon.txt is OK

Checking data/dict/extra_questions.txt ... --> data/dict/extra_questions.txt is empty (this is OK) --> SUCCESS [validating dictionary directory data/dict]

**Creating data/dict/lexiconp.txt from data/dict/lexicon.txt fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int prepare_lang.sh: validating output directory utils/validate_lang.pl data/lang Checking existence of separator file separator file data/lang/subword_separator.txt is empty or does not exist, deal in word case. Checking data/lang/phones.txt ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/lang/phones.txt is OK

Checking words.txt: #0 ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/lang/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ... --> silence.txt and nonsilence.txt are disjoint --> silence.txt and disambig.txt are disjoint --> disambig.txt and nonsilence.txt are disjoint --> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ... --> found no unexplainable phones in phones.txt

Checking data/lang/phones/context_indep.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 10 entry/entries in data/lang/phones/context_indep.txt --> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt --> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt --> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 116 entry/entries in data/lang/phones/nonsilence.txt --> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt --> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt --> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 10 entry/entries in data/lang/phones/silence.txt --> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt --> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt --> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 1 entry/entries in data/lang/phones/optional_silence.txt --> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt --> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt --> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 7 entry/entries in data/lang/phones/disambig.txt --> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt --> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt --> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 31 entry/entries in data/lang/phones/roots.txt --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt --> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 31 entry/entries in data/lang/phones/sets.txt --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt --> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 9 entry/entries in data/lang/phones/extra_questions.txt --> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt --> data/lang/phones/extra_questions.{txt, int} are OK

Checking data/lang/phones/word_boundary.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 126 entry/entries in data/lang/phones/word_boundary.txt --> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt --> data/lang/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ... --> reading data/lang/phones/optional_silence.txt --> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1 --> data/lang/phones/disambig.txt has "#0" and "#1" --> data/lang/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ... --> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt --> data/lang/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols... --> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh) Checking word_boundary.int and disambig.int --> generating a 85 word/subword sequence --> resulting phone sequence from L.fst corresponds to the word sequence --> L.fst is OK --> generating a 10 word/subword sequence --> resulting phone sequence from L_disambig.fst corresponds to the word sequence --> L_disambig.fst is OK

Checking data/lang/oov.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 1 entry/entries in data/lang/oov.txt --> data/lang/oov.int corresponds to data/lang/oov.txt --> data/lang/oov.{txt, int} are OK

--> data/lang/L.fst is olabel sorted --> data/lang/L_disambig.fst is olabel sorted --> SUCCESS [validating lang directory data/lang] utils/mkgraph_lookahead.sh : compiling grammar data/tr-mix.lm.gz utils/mkgraph_lookahead.sh : expected data/tr-mix.lm.gz to exist