123srikanth commented 2 years ago

Hi there I have completed training for Vosk language model adaptation for US English model and I have picked graph, G.fst, G.carpa, and rnnlm_out folder from the trained model and replaced those with folders in the pretrained model that I have downloaded from Vosk english models but in pretrained model that I have downloaded there is no rnnlm folder so that I can place rnnlm_out in it. So I skipped it and ran the model but this is what I get and it exits from taking input.

ubuntu@123$python3 kaldi.py LOG (VoskAPI:ReadDataFiles():model.cc:206) Decoding params beam=13 max-active=7000 lattice-beam=6 LOG (VoskAPI:ReadDataFiles():model.cc:209) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes. LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components. LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2 LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.118831 seconds in looped compilation. LOG (VoskAPI:ReadDataFiles():model.cc:233) Loading i-vector extractor from latest_model_en/ivector/final.ie LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (VoskAPI:ReadDataFiles():model.cc:263) Loading HCLG from latest_model_en/graph/HCLG.fst LOG (VoskAPI:ReadDataFiles():model.cc:278) Loading words from latest_model_en/graph/words.txt LOG (VoskAPI:ReadDataFiles():model.cc:287) Loading winfo latest_model_en/graph/phones/word_boundary.int LOG (VoskAPI:ReadDataFiles():model.cc:317) Loading CARPA model from latest_model_en/rescore/G.carpa ################################################################################ Press Ctrl+C to stop the recording ################################################################################ Segmentation fault (core dumped) ubuntu@123$

nshmyrev commented 2 years ago

Which vosk version are you using? I would update the version first.

123srikanth commented 2 years ago

I have updated the vosk package from vosk-0.3.30 to vosk-0.3.32 and tried but it still throws same output

nshmyrev commented 2 years ago

The output must be different if you updated properly

123srikanth commented 2 years ago

this is the command I have used to update

$ pip3 install vosk --upgrade Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: vosk in /home/srikanth/.local/lib/python3.6/site-packages (0.3.32) Requirement already satisfied: cffi>=1.0 in /home/srikanth/.local/lib/python3.6/site-packages (from vosk) (1.14.6) Requirement already satisfied: pycparser in /home/srikanth/.local/lib/python3.6/site-packages (from cffi>=1.0->vosk) (2.20)

123srikanth commented 2 years ago

actually the pretrained model runs fine but when I replaced the graph inside with the language model that I have trained this is happening

nshmyrev commented 2 years ago

Probably some files mismatch

123srikanth commented 2 years ago

yes the US English vosk package that I have downloaded from here https://alphacephei.com/vosk/lm generated rnnlm_out folder after training but the pre trained vosk-model-en-us-daanzu-20200905 model that I downloaded from here https://alphacephei.com/vosk/models does not have rnnlm folder so that I can replace rnnlm_out folder with the one generated in training. Is this could be the exact reason?

Asma-droid commented 2 years ago

@123srikanth have you find a solution please! i have faced the same problem as you

Asma-droid commented 2 years ago

@nshmyrev https://we.tl/t-x14J8A8CTN here is the model. Thanks for help

nshmyrev commented 2 years ago

@Asma-droid that model runs fine here. Try to collect backtrace probably. Run

gdb --args python3 ./test-simple.py test.wav

then type run. When it crashes with segfault type

bt

and share the output.

Asma-droid commented 2 years ago

@nshmyrev I have got this as problem Thread 1 "python3" received signal SIGSEGV, Segmentation fault. 0x00007ffff5db5576 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from ...../.conda/envs/GPU/lib/python3.9/site-packages/vosk/libvosk.so

nshmyrev commented 2 years ago

Is this with a default file? Please share the audio file that caused the crash.

Can you try on several machines? Does it crash on any of them?

nshmyrev commented 2 years ago

So https://we.tl/t-x14J8A8CTN works fine, https://we.tl/t-mLgdF78rWn crashes. Both models have tree with 3760 leafs and the first model has graph with 3760 pdfs but the second graph has only 3720 pdfs. You built graph incorrectly with a different tree it seems. It is a reason of crash.

You just need to make sure your graph, final.mdl and tree files are in sync.

Asma-droid commented 2 years ago

@nshmyrev thank you very match. It is helpful!

japita-se commented 2 years ago

same problem here for a simple adaptaion to new words. Where is the final.mdl?

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffe7fe45e6 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
(gdb) bt

0 0x00007fffe7fe45e6 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

1 0x00007fffe7ee0ca6 in kaldi::LatticeFasterDecoderTpl<fst::ConstFst<fst::ArcTpl<fst::TropicalWeightTpl >, unsigned int>, kaldi::decoder::BackpointerToken>::ProcessEmitting(kaldi:

:DecodableInterface*) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

2 0x00007fffe7ee2263 in kaldi::LatticeFasterDecoderTpl<fst::ConstFst<fst::ArcTpl<fst::TropicalWeightTpl >, unsigned int>, kaldi::decoder::BackpointerToken>::AdvanceDecoding(kaldi:

:DecodableInterface*, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

3 0x00007fffe7e1321d in KaldiRecognizer::AcceptWaveform(kaldi::Vector&) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

4 0x00007fffe7e1340e in KaldiRecognizer::AcceptWaveform(char const*, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

5 0x00007fffe7eb1ff9 in vosk_recognizer_accept_waveform () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

6 0x00007ffff7fb8af0 in ffi_call_unix64 () from /lib64/libffi.so.6

7 0x00007ffff7fb82ab in ffi_call () from /lib64/libffi.so.6

8 0x00007fffea436728 in cdata_call () from /usr/lib64/python3.8/site-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so

9 0x00007ffff7ba22d1 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0

10 0x00007ffff7b9f0ad in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0

11 0x00007ffff7ba7df7 in function_code_fastcall () from /lib64/libpython3.8.so.1.0

12 0x00007ffff7b99fff in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0

13 0x00007ffff7b986e4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0

14 0x00007ffff7c14ddd in PyEval_EvalCodeEx () from /lib64/libpython3.8.so.1.0

15 0x00007ffff7c14d8f in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0

16 0x00007ffff7c362a8 in run_eval_code_obj () from /lib64/libpython3.8.so.1.0

17 0x00007ffff7c35533 in run_mod () from /lib64/libpython3.8.so.1.0

18 0x00007ffff7b1ed18 in pyrun_file () from /lib64/libpython3.8.so.1.0

19 0x00007ffff7b1df91 in PyRun_SimpleFileExFlags () from /lib64/libpython3.8.so.1.0

20 0x00007ffff7b15238 in Py_RunMain.cold () from /lib64/libpython3.8.so.1.0

21 0x00007ffff7c0866d in Py_BytesMain () from /lib64/libpython3.8.so.1.0

22 0x00007ffff7df0082 in __libc_start_main () from /lib64/libc.so.6

23 0x000055555555509e in _start ()

I just followed the procedure, and copied the right files. It seems the the ivecor is not in sync.

japita-se commented 2 years ago

More details here:

./compile-graph.sh                                                                                                                                                                                                                       
+ rm -rf data/en-mix-small.lm.gz data/en-mix.lm.gz data/en-mixp.lm.gz data/extra.lm.gz data/lang_local data/dict data/lang data/lang_test data/lang_test_rescore                                                                              
+ rm -rf exp/lgraph                                                                                                    
+ rm -rf exp/graph                                                                                                     
+ mkdir -p data/dict                                                                                                                                                                                                                          
+ cp db/phone/extra_questions.txt db/phone/nonsilence_phones.txt db/phone/optional_silence.txt db/phone/silence_phones.txt data/dict                                                                                                          
+ python3 ./dict.py                                                                                                    
+ ngram-count -wbdiscount -order 4 -text db/extra.txt -lm data/extra.lm.gz                                             
+ ngram -order 4 -lm db/en-230k-0.5.lm.gz -mix-lm data/extra.lm.gz -lambda 0.95 -write-lm data/en-mix.lm.gz            
+ ngram -order 4 -lm data/en-mix.lm.gz -prune 3e-8 -write-lm data/en-mixp.lm.gz                                        
+ ngram -lm data/en-mixp.lm.gz -write-lm data/en-mix-small.lm.gz                                                       
+ utils/prepare_lang.sh data/dict '[unk]' data/lang_local data/lang                                                    
utils/prepare_lang.sh data/dict [unk] data/lang_local data/lang                                                        
Checking data/dict/silence_phones.txt ...                                                                              
--> reading data/dict/silence_phones.txt                                                                               
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                                                                             
--> data/dict/silence_phones.txt is OK                                                                                 

Checking data/dict/optional_silence.txt ...                                                                                                                                                                                                   
--> reading data/dict/optional_silence.txt                                                                                                                                                                                                    
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                                                                                                                                                     
--> text contains only allowed whitespaces                                                                             
--> data/dict/optional_silence.txt is OK                                                                               

Checking data/dict/nonsilence_phones.txt ...                                                                                                                                                                                                  
--> reading data/dict/nonsilence_phones.txt                                                                                                                                                                                                   --> text seems to be UTF-8 or ASCII, checking whitespaces                                                                                                                                                                                     
--> text contains only allowed whitespaces                                                                                                                                                                                                    
--> data/dict/nonsilence_phones.txt is OK                                                                                                                                                                                                     

Checking disjoint: silence_phones.txt, nonsilence_phones.txt                                                                                                                                                                                  
--> disjoint property is OK.                                                                                                                                                                                                                  

Checking data/dict/lexicon.txt                                                                                                                                                                                                                
--> reading data/dict/lexicon.txt                                                                                                                                                                                                             
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                                                                                                                                                     
--> text contains only allowed whitespaces                                                                                                                                                                                                    
--> data/dict/lexicon.txt is OK                                                                   

Checking data/dict/extra_questions.txt ...                 
--> reading data/dict/extra_questions.txt                  
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> data/dict/extra_questions.txt is OK                    
--> SUCCESS [validating dictionary directory data/dict]                                                                

**Creating data/dict/lexiconp.txt from data/dict/lexicon.txt                                                           
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int                             
prepare_lang.sh: validating output directory               
utils/validate_lang.pl data/lang                           
Checking existence of separator file                       
separator file data/lang/subword_separator.txt is empty or does not exist, deal in word case.                          
Checking data/lang/phones.txt ...                          
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> data/lang/phones.txt is OK                             

Checking words.txt: #0 ...                                 
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> data/lang/words.txt is OK                              

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...                                                       
--> silence.txt and nonsilence.txt are disjoint            
--> silence.txt and disambig.txt are disjoint              
--> disambig.txt and nonsilence.txt are disjoint           
--> disjoint property is OK                                

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...                                                       
--> found no unexplainable phones in phones.txt            

Checking data/lang/phones/context_indep.{txt, int, csl} ...                                                            
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 55 entry/entries in data/lang/phones/context_indep.txt                                                             
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt                               
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt                               
--> data/lang/phones/context_indep.{txt, int, csl} are OK                                                              

Checking data/lang/phones/nonsilence.{txt, int, csl} ...                                                               
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 172 entry/entries in data/lang/phones/nonsilence.txt                                                               
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt                                     
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt                                     
--> data/lang/phones/nonsilence.{txt, int, csl} are OK                                                                 

Checking data/lang/phones/silence.{txt, int, csl} ...      
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 55 entry/entries in data/lang/phones/silence.txt       
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt                                                                                                                                                         [152/422]
--> data/lang/phones/silence.{txt, int, csl} are OK        

Checking data/lang/phones/optional_silence.{txt, int, csl} ...                                                         
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 1 entry/entries in data/lang/phones/optional_silence.txt                                                           
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt                         
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt                         
--> data/lang/phones/optional_silence.{txt, int, csl} are OK                                                           

Checking data/lang/phones/disambig.{txt, int, csl} ...                                                                 
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 34 entry/entries in data/lang/phones/disambig.txt      
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt                                         
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt                                         
--> data/lang/phones/disambig.{txt, int, csl} are OK       

Checking data/lang/phones/roots.{txt, int} ...             
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 54 entry/entries in data/lang/phones/roots.txt         
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt                                               
--> data/lang/phones/roots.{txt, int} are OK               

Checking data/lang/phones/sets.{txt, int} ...              
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 54 entry/entries in data/lang/phones/sets.txt          
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt                                                 
--> data/lang/phones/sets.{txt, int} are OK                

Checking data/lang/phones/extra_questions.{txt, int} ...                                                               
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 11 entry/entries in data/lang/phones/extra_questions.txt                                                           
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt                           
--> data/lang/phones/extra_questions.{txt, int} are OK                                                                 

Checking data/lang/phones/word_boundary.{txt, int} ...                                                                 
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 227 entry/entries in data/lang/phones/word_boundary.txt                                                            
--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt                               
--> data/lang/phones/word_boundary.{txt, int} are OK       

Checking optional_silence.txt ...                          
--> reading data/lang/phones/optional_silence.txt          
--> data/lang/phones/optional_silence.txt is OK            

Checking disambiguation symbols: #0 and #1                 
--> data/lang/phones/disambig.txt has "#0" and "#1"        
--> data/lang/phones/disambig.txt is OK                 
Checking topo ...                                          

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...                                              
--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols                                          
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt                                  
--> data/lang/phones/word_boundary.txt is OK               

Checking word-level disambiguation symbols...              
--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)                                                      
Checking word_boundary.int and disambig.int                
--> generating a 90 word/subword sequence                  
--> resulting phone sequence from L.fst corresponds to the word sequence                                               
--> L.fst is OK                                            
--> generating a 62 word/subword sequence                  
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence                                      
--> L_disambig.fst is OK                                   

Checking data/lang/oov.{txt, int} ...                      
--> text seems to be UTF-8 or ASCII, checking whitespaces                                                              
--> text contains only allowed whitespaces                 
--> 1 entry/entries in data/lang/oov.txt                   
--> data/lang/oov.int corresponds to data/lang/oov.txt                                                                 
--> data/lang/oov.{txt, int} are OK                        

--> data/lang/L.fst is olabel sorted                       
--> data/lang/L_disambig.fst is olabel sorted              
--> SUCCESS [validating lang directory data/lang]          
+ utils/format_lm.sh data/lang data/en-mix-small.lm.gz data/dict/lexicon.txt data/lang_test                            
Converting 'data/en-mix-small.lm.gz' to FST                
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst                      
LOG (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:94) Reading \data\ section.                                 
LOG (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.                             
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 8 [-5.564116        'a      -0.003993742] skipped: word ''a' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 9 [-6.14543 'all    -0.01290929] skipped: word ''all' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 10 [-7.045512       'am     -0.2015137] skipped: word ''am' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 11 [-8.18103        'amour] skipped: word ''amour' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 12 [-8.207732       'angelo] skipped: word ''angelo' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 13 [-8.016088       'apercois] skipped: word ''apercois' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 14 [-8.249146       'aquila] skipped: word ''aquila' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 15 [-8.25028        'arche] skipped: word ''arche' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 16 [-7.926421       'brian] skipped: word ''brian' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 18 [-7.930167       'cuse] skipped: word ''cuse' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 20 [-8.30919        'dour] skipped: word ''dour' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 21 [-3.860634       'em     -0.223105] skipped: word ''em' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 22 [-8.264342       'espace] skipped: word ''espace' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 23 [-8.071111       'est] skipped: word ''est' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 24 [-8.303288       'grady] skipped: word ''grady' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 25 [-5.932466       'in     -0.08969782] skipped: word ''in' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 26 [-8.024973       'ites] skipped: word ''ites' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 27 [-8.258896       'ivoire] skipped: word ''ivoire' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 28 [-8.236024       'lin] skipped: word ''lin' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 29 [-6.492578       'll     -0.03924241] skipped: word ''ll' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 31 [-8.266165       'mma] skipped: word ''mma' not in symbol table

WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 33 [-5.76597        'n      -0.1020972] skipped: word ''n' not in symbol table                                                                               [43/422]
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 34 [-8.172307       'neill] skipped: word ''neill' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 35 [-8.233274       'oeuvres] skipped: word ''oeuvres' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 37 [-8.175941       'reilly] skipped: word ''reilly' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 39 [-8.227567       'shea] skipped: word ''shea' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 41 [-8.300121       'toole] skipped: word ''toole' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 43 [-8.168487       'wa] skipped: word ''wa' not in symbol table
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:219) line 44 [-7.253666       'walking] skipped: word ''walking' not in symbol table
LOG (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.                             
LOG (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.                             
WARNING (arpa2fst[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:259) Of 112505 parse warnings, 30 were reported. Run program with --max-arpa-warnings=-1 to see all warnings
LOG (arpa2fst[5.5.1012~1-dd107]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 2098786 to 300818
fstisstochastic data/lang_test/G.fst                       
0.515945 -2.26027                                          
Succeeded in formatting LM: 'data/en-mix-small.lm.gz'      
+ utils/mkgraph.sh --self-loop-scale 1.0 data/lang_test exp/chain/tdnn exp/chain/tdnn/graph                            
tree-info exp/chain/tdnn/tree                              
tree-info exp/chain/tdnn/tree                              
fstminimizeencoded                                         
fsttablecompose data/lang_test/L_disambig.fst data/lang_test/G.fst                                                     
fstpushspecial                                             
fstdeterminizestar --use-log=true                          
fstisstochastic data/lang_test/tmp/LG.fst                  
-0.23562 -0.236088                                         
[info]: LG not stochastic.                                 
fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang_test/phones/disambig.int --write-disambig-syms=data/lang_test/tmp/disambig_ilabels_2_1.int data/lang_test/tmp/ilabels_2_1.53093 data/lang_test/tmp/LG.f
st                                                         
fstisstochastic data/lang_test/tmp/CLG_2_1.fst             
-0.23562 -0.236088                                         
[info]: CLG not stochastic.                                
make-h-transducer --disambig-syms-out=exp/chain/tdnn/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_2_1 exp/chain/tdnn/tree exp/chain/tdnn/final.mdl 
fsttablecompose exp/chain/tdnn/graph/Ha.fst data/lang_test/tmp/CLG_2_1.fst                                             
fstminimizeencoded                                         
fstdeterminizestar --use-log=true                          
fstrmsymbols exp/chain/tdnn/graph/disambig_tid.int         
fstrmepslocal                                              
fstisstochastic exp/chain/tdnn/graph/HCLGa.fst             
-0.224546 -0.909919                                        
HCLGa is not stochastic                                    
add-self-loops --self-loop-scale=1.0 --reorder=true exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLGa.fst            
fstisstochastic exp/chain/tdnn/graph/HCLG.fst              
1.90465e-09 -0.664465                                      
[info]: final HCLG is not stochastic.                      
+ utils/build_const_arpa_lm.sh data/en-mix.lm.gz data/lang_test data/lang_test_rescore                                 
arpa-to-const-arpa --bos-symbol=312360 --eos-symbol=312361 --unk-symbol=38 'gunzip -c data/en-mix.lm.gz | utils/map_arpa_lm.pl data/lang_test_rescore/words.txt|' data/lang_test_rescore/G.carpa 
LOG (arpa-to-const-arpa[5.5.1012~1-dd107]:BuildConstArpaLm():const-arpa-lm.cc:1078) Reading gunzip -c data/en-mix.lm.gz | utils/map_arpa_lm.pl data/lang_test_rescore/words.txt|
utils/map_arpa_lm.pl: Processing "\data\"                  
utils/map_arpa_lm.pl: Processing "\1-grams:\"              
utils/map_arpa_lm.pl: Warning: OOV line -5.564116   'a       -0.05645481                                               
utils/map_arpa_lm.pl: Warning: OOV line -6.14543    'all     -0.1704088                                                
utils/map_arpa_lm.pl: Warning: OOV line -7.045512   'am      -0.2584558                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.18103    'amour   -0.06738149                                               
utils/map_arpa_lm.pl: Warning: OOV line -8.207732   'angelo  -0.06931987                                               
utils/map_arpa_lm.pl: Warning: OOV line -8.016088   'apercois                                     
utils/map_arpa_lm.pl: Warning: OOV line -8.249146   'aquila  -0.01997986                                               
utils/map_arpa_lm.pl: Warning: OOV line -8.25028    'arche   -0.08045298                                               
utils/map_arpa_lm.pl: Warning: OOV line -7.926421   'brian   -0.05977602                                               
utils/map_arpa_lm.pl: Warning: OOV line -7.930167   'cuse    -0.05086565                                               
utils/map_arpa_lm.pl: Warning: OOV line -8.30919    'dour    -0.1287646                                                
utils/map_arpa_lm.pl: Warning: OOV line -3.860634   'em      -0.2781028                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.264342   'espace  -0.1046457                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.071111   'est     -0.1136602                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.303288   'grady   -0.1579387                                                
utils/map_arpa_lm.pl: Warning: OOV line -5.932466   'in      -0.2266943                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.024973   'ites    -0.01785311                                               
utils/map_arpa_lm.pl: Warning: OOV line -8.258896   'ivoire  -0.0884648                                                
utils/map_arpa_lm.pl: Warning: OOV line -8.236024   'lin     -0.09440096                                               
LOG (arpa-to-const-arpa[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:94) Reading \data\ section.                       
LOG (arpa-to-const-arpa[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.                   
utils/map_arpa_lm.pl: Processing "\2-grams:\"              
LOG (arpa-to-const-arpa[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.                   
utils/map_arpa_lm.pl: Processing "\3-grams:\"              
LOG (arpa-to-const-arpa[5.5.1012~1-dd107]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.                   
Terminated                                                 
+ rnnlm/change_vocab.sh data/lang/words.txt exp/rnnlm exp/rnnlm_out                                                    
rnnlm/change_vocab.sh: Copying config directory.           
rnnlm/change_vocab.sh: Re-generating words.txt, unigram_probs.txt, word_feats.txt and word_embedding.final.mat.        
rnnlm/get_word_features.py: made features for 312363 words.                                                            
rnnlm-get-word-embedding exp/rnnlm_out/word_feats.txt exp/rnnlm_out/feat_embedding.final.mat exp/rnnlm_out/word_embedding.final.mat 
+ utils/mkgraph_lookahead.sh --self-loop-scale 1.0 data/lang exp/chain/tdnn data/en-mix-small.lm.gz exp/chain/tdnn/lgraph
utils/mkgraph_lookahead.sh : compiling grammar data/en-mix-small.lm.gz                                                 
tree-info exp/chain/tdnn/tree                              
tree-info exp/chain/tdnn/tree                              
fstdeterminizestar data/lang/L_disambig.fst                
fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang/phones/disambig.int --write-disambig-syms=exp/chain/tdnn/lgraph/disambig_ilabels_2_1.int exp/chain/tdnn/lgraph/ilabels_2_1.53904 exp/chain/tdnn/lgraph/
L_disambig_det.fst                                         
make-h-transducer --disambig-syms-out=exp/chain/tdnn/lgraph/disambig_tid.int --transition-scale=1.0 exp/chain/tdnn/lgraph/ilabels_2_1 exp/chain/tdnn/tree exp/chain/tdnn/final.mdl 
fstdeterminizestar                                         
add-self-loops --disambig-syms=exp/chain/tdnn/lgraph/disambig_tid.int --self-loop-scale=1.0 --reorder=true exp/chain/tdnn/final.mdl 
apply_map.pl: warning! missing key 0 in exp/chain/tdnn/lgraph/relabel                                                  
apply_map.pl: warning! missing key 312361 in exp/chain/tdnn/lgraph/relabel                                             

cp model_en model_en_recompiled
cp -r model_en model_en_recompiled
vim data/dict/lexicon.txt
tree model_en_recompiled
cp vosk-model-en-us-0.22-compile/exp/chain/tdnn/lgraph/HCLr.fst model_en_recompiled/graph/.
cp vosk-model-en-us-0.22-compile/exp/chain/tdnn/lgraph/Gr.fst model_en_recompiled/graph/.
cp vosk-model-en-us-0.22-compile/exp/chain/tdnn/lgraph/disambig_tid.int model_en_recompiled/graph/.
cp vosk-model-en-us-0.22-compile/exp/chain/tdnn/lgraph/phones/word_boundary.int model_en_recompiled/graph/phones/.

>gdb --args python ./test_microphone_words.py --model=model_en_recompiled

(gdb) run
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffe7fe45e6 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

(gdb) bt
#0  0x00007fffe7fe45e6 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#1  0x00007fffe7ee1698 in kaldi::LatticeFasterDecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >, kaldi::decoder::BackpointerToken>::ProcessEmitting(kaldi::DecodableInterface*) ()
   from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#2  0x00007fffe7ee3457 in kaldi::LatticeFasterDecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >, kaldi::decoder::BackpointerToken>::AdvanceDecoding(kaldi::DecodableInterface*, int) ()
   from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#3  0x00007fffe7e1321d in KaldiRecognizer::AcceptWaveform(kaldi::Vector<float>&) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#4  0x00007fffe7e1340e in KaldiRecognizer::AcceptWaveform(char const*, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#5  0x00007fffe7eb1ff9 in vosk_recognizer_accept_waveform () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so
#6  0x00007ffff7fb8af0 in ffi_call_unix64 () from /lib64/libffi.so.6
#7  0x00007ffff7fb82ab in ffi_call () from /lib64/libffi.so.6
#8  0x00007fffea436728 in cdata_call () from /usr/lib64/python3.8/site-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so
#9  0x00007ffff7ba22d1 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0
#10 0x00007ffff7b9f0ad in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#11 0x00007ffff7ba7df7 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#12 0x00007ffff7b99fff in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#13 0x00007ffff7b986e4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#14 0x00007ffff7c14ddd in PyEval_EvalCodeEx () from /lib64/libpython3.8.so.1.0
#15 0x00007ffff7c14d8f in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0
#16 0x00007ffff7c362a8 in run_eval_code_obj () from /lib64/libpython3.8.so.1.0
#17 0x00007ffff7c35533 in run_mod () from /lib64/libpython3.8.so.1.0
#18 0x00007ffff7b1ed18 in pyrun_file () from /lib64/libpython3.8.so.1.0
#19 0x00007ffff7b1df91 in PyRun_SimpleFileExFlags () from /lib64/libpython3.8.so.1.0
#20 0x00007ffff7b15238 in Py_RunMain.cold () from /lib64/libpython3.8.so.1.0
#21 0x00007ffff7c0866d in Py_BytesMain () from /lib64/libpython3.8.so.1.0
#22 0x00007ffff7df0082 in __libc_start_main () from /lib64/libc.so.6
#23 0x000055555555509e in _start ()
(gdb)

wwx007121 commented 2 years ago

pykaldi && python-openfst are only python wrapper ,which cause many core dumps

alphacep / vosk-api

Segmentation fault (core dumped) with adapted LM #791

0 0x00007fffe7fe45e6 in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

1 0x00007fffe7ee0ca6 in kaldi::LatticeFasterDecoderTpl<fst::ConstFst<fst::ArcTpl<fst::TropicalWeightTpl >, unsigned int>, kaldi::decoder::BackpointerToken>::ProcessEmitting(kaldi:

2 0x00007fffe7ee2263 in kaldi::LatticeFasterDecoderTpl<fst::ConstFst<fst::ArcTpl<fst::TropicalWeightTpl >, unsigned int>, kaldi::decoder::BackpointerToken>::AdvanceDecoding(kaldi:

3 0x00007fffe7e1321d in KaldiRecognizer::AcceptWaveform(kaldi::Vector&) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

4 0x00007fffe7e1340e in KaldiRecognizer::AcceptWaveform(char const*, int) () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

5 0x00007fffe7eb1ff9 in vosk_recognizer_accept_waveform () from /home/giuseppe/.local/lib/python3.8/site-packages/vosk/libvosk.so

6 0x00007ffff7fb8af0 in ffi_call_unix64 () from /lib64/libffi.so.6

7 0x00007ffff7fb82ab in ffi_call () from /lib64/libffi.so.6

8 0x00007fffea436728 in cdata_call () from /usr/lib64/python3.8/site-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so

9 0x00007ffff7ba22d1 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0

10 0x00007ffff7b9f0ad in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0

11 0x00007ffff7ba7df7 in function_code_fastcall () from /lib64/libpython3.8.so.1.0

12 0x00007ffff7b99fff in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0

13 0x00007ffff7b986e4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0

14 0x00007ffff7c14ddd in PyEval_EvalCodeEx () from /lib64/libpython3.8.so.1.0

15 0x00007ffff7c14d8f in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0

16 0x00007ffff7c362a8 in run_eval_code_obj () from /lib64/libpython3.8.so.1.0

17 0x00007ffff7c35533 in run_mod () from /lib64/libpython3.8.so.1.0

18 0x00007ffff7b1ed18 in pyrun_file () from /lib64/libpython3.8.so.1.0

19 0x00007ffff7b1df91 in PyRun_SimpleFileExFlags () from /lib64/libpython3.8.so.1.0

20 0x00007ffff7b15238 in Py_RunMain.cold () from /lib64/libpython3.8.so.1.0

21 0x00007ffff7c0866d in Py_BytesMain () from /lib64/libpython3.8.so.1.0

22 0x00007ffff7df0082 in __libc_start_main () from /lib64/libc.so.6

23 0x000055555555509e in _start ()