alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.2k stars 1.13k forks source link

Update package EN 0.22 fails: ERROR: SymbolTable::Read: Read failed: standard input #1209

Open svenha opened 2 years ago

svenha commented 2 years ago

I tried the update package for EN (vosk-model-en-us-0.22-compile.zip). Compared to version 0.21, which worked perfectly, this fails even if do not add any sentences or words.

 ./compile-graph.sh
...
tils/mkgraph_lookahead.sh --self-loop-scale 1.0 data/lang exp/chain/tdnn data/en-mix-small.lm.gz exp/chain/tdnn/lgraph                                                                                                                                                                                                      
utils/mkgraph_lookahead.sh : compiling grammar data/en-mix-small.lm.gz                                                                                                                                                                                                                                                         
tree-info exp/chain/tdnn/tree                                                                                                                                                                                                                                                                                                  
tree-info exp/chain/tdnn/tree                                                                                                                                                                                                                                                                                                  
fstdeterminizestar data/lang/L_disambig.fst                                                                                                                                                                                                                                                                                    
 fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang/phones/disambig.int --writedisambig-syms=exp/chain/tdnn/lgraph/disambig_ilabels_2_1.int exp/chain/tdnn/lgraph/ilabels_2_1.993828 exp/chain/tdnn/lgraph/L_disambig_det.fst                                                              
make-h-transducer --disambig-syms-out=exp/chain/tdnn/lgraph/disambig_tid.int --transition-scale=1.0 exp/chain/tdnn/lgraph/ilabels_2_1 exp/chain/tdnn/tree exp/chain/tdnn/final.mdl                                                                                                                                             
fstdeterminizestar                                                                                                                                                                                                                                                                                                             
add-self-loops --disambig-syms=exp/chain/tdnn/lgraph/disambig_tid.int --self-loop-scale=1.0 --reorder=true exp/chain/tdnn/final.mdl                                                                                                                                                                                            
apply_map.pl: warning! missing key 0 in exp/chain/tdnn/lgraph/relabel                                                                                                                                                                                                                                                          
apply_map.pl: warning! missing key 312340 in exp/chain/tdnn/lgraph/relabel                                                                                                                                                                                                                                                     
ERROR: SymbolTable::Read: Read failed                                                                                                                                                                                                                                                                                          
ERROR: SymbolTable::Read: Read failed                                                                                                                                                                                                                                                                                          
ERROR: VectorFst::Read: Unexpected end of file: standard input                                                                                                                                                                                                                                                                 
ERROR: FstHeader::Read: Bad FST header: standard input. Magic number not matched. Got: 0`
nshmyrev commented 2 years ago

Probably there were earlier errors, you need to read the full log.

svenha commented 2 years ago

There are no visible errors, only some warnings:

--> SUCCESS [validating lang directory data/lang]                                                                                                                                                                                                                                                                              
+ utils/format_lm.sh data/lang data/en-mix-small.lm.gz data/dict/lexicon.txt data/lang_test                                                                                                                                                                                                                                    
Converting 'data/en-mix-small.lm.gz' to FST                                                                                                                                                                                                                                                                                    
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst                                                                                                                                                                                                                              
LOG (arpa2fst[5.5.1046~1-76cd5]:Read():arpa-file-parser.cc:94) Reading \data\ section.                                                                                                                                                                                                                                         
LOG (arpa2fst[5.5.1046~1-76cd5]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.                                                                                                                                                                                                                                     
WARNING (arpa2fst[5.5.1046~1-76cd5]:Read():arpa-file-parser.cc:219) line 8 [-5.564116   'a      -0.003993742] skipped: word ''a' not in symbol table

Many more, all about words with apostrophs. Can someone reproduce this? Just unpack and run ./compile-graph.sh inside.

nshmyrev commented 2 years ago

Then it might be opengrm/ld_library_path issue, you need to run commands manually and check what is going on

svenha commented 2 years ago

OK. There is some confusion about opengrm versions. path.sh of the update package expects version 1.3.10. Kaldi's installation script installs 1.3.12 and newer versions are around, too. Which version is known to work?

nshmyrev commented 2 years ago

Which version is known to work?

Both should work fine

kwiechen commented 1 year ago

I have the same problem using kaldi with opengrm 1.3.12. Using kaldi from https://github.com/kaldi-asr/kaldi using opengrm 1.3.7 is running without problems

svenha commented 1 year ago

@kwiechen Thanks for the feedback. I understand that you can use the update package if you switch the kaldi AND the opengram version? What happens if you reduce the problem by changing only one of the two (opengram, kaldi)?

kwiechen commented 1 year ago

I have reinstalled kaldi from kaldi-master and opengrm 1.3.7 to solve this

svenha commented 1 year ago

Thanks @kwiechen . I followed your advice and it solved my problem. Unfortunately, I have no idea why the Vosk-way did not work :-( Just a guess: as I cannot compile opengrm 1.3.10 or newer (probably incompatible with openfst 1.7.2 that normal Kaldi installs), there might be a subtle version incompatibility.

nshmyrev commented 1 week ago

We have this fix for the issue:

https://github.com/alphacep/kaldi/commit/11b67d387b547d1afb616ab8f95fd74c459d20c6

please check that your scripts are up-to-date

nshmyrev commented 1 week ago

I have rebuilt zip package with required changes, please redownload if you have an old version. You need to have utils/relabel_words.py