daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
334 stars 49 forks source link

trouble running sample code #8

Closed mrever closed 4 years ago

mrever commented 4 years ago

python crashes when running this sample code:

########## import sys, wave from kaldi_active_grammar import PlainDictationRecognizer recognizer = PlainDictationRecognizer() wfi = 'test.wav' ### load a 16 kHz 16-bit example wav file from kaldi repository wave_file = wave.open(wfi, 'rb') data = wave_file.readframes(wave_file.getnframes()) print(type(data), len(data)) #bytes object with in16 audio data output_str, likelihood = recognizer.decode_utterance(data) print('won\'t get here, decode_utterance crashes python') print(repr(output_str), likelihood)
###########

type(data), len(data) = (<class 'bytes'>, 46002)

test.wav is from here: https://github.com/kaldi-asr/kaldi/tree/master/src/feat/test_data/test.wav

kaldi_model is from here: https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.2.0/kaldi_model_zamia.zip

kaldi_active_grammar.version = '1.2.0' sys.version = '3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]'

Any idea what might be going wrong? Is there a sample wav file that I should try?

Thanks

daanzu commented 4 years ago

Strange, it seems to work fine for me (although that wav doesn't sound like English). Output below:

No handlers could be found for logger "kaldi.model"
(<type 'str'>, 46002)
("u'like eighty percent'", 1.0372425317764282)

Try running it with full debugging output by putting at the top:

import logging
logging.basicConfig(level=1)
mrever commented 4 years ago

Here's the logging output:

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\words.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\phones.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\align_lexicon.int'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\disambig.int'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\L_disambig.fst'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\tree'

DEBUG:kaldi:kaldi_active_grammar: find_file cannot find required file '1.mdl' in 'kaldi_model\' (or subdirectories)

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\final.mdl'

DEBUG:kaldi:kaldi_active_grammar: find_file cannot find required file 'g.irelabel' in 'kaldi_model\' (or subdirectories)

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\user_lexicon.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\left_context_phones.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\nonterminals.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\wdisambig_phones.int'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\wdisambig_words.int'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\lexiconp_disambig.txt'

DEBUG:kaldi.model:loading words from 'kaldi_model\words.txt'

DEBUG:kaldi.compiler:KaldiRule(-1, top): Skipped full compilation thanks to FileCache

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\words.txt'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\align_lexicon.int'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\conf\mfcc_hires.conf'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\ivectors_test_hires\conf\ivector_extractor.conf'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\conf\online_cmvn.conf'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\ivectors_test_hires\conf\splice.conf'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.mat'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\global_cmvn.stats'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.dubm'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.ie'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\final.mdl'

DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\phones.txt'

LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:128) nonterm_phones_offset: 993 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:129) rules_nonterm_offset: 7 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:130) dictation_nonterm_offset: 5 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:131) word_syms_filename: kaldi_model\words.txt LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:132) word_align_lexicon_filename: kaldi_model\align_lexicon.int LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:133) mfcc_config_filename: kaldi_model\conf\mfcc_hires.conf LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:134) ie_config_filename: kaldi_model.tmp\ivector_extractor.conf LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:135) model_filename: kaldi_model\final.mdl LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:136) top_fst_filename: kaldi_model.tmp\6c29c7b4d8970b63ad320d7cac5296a119b6ab3f.fst LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:137) dictation_fst_filename: kaldi_model\Dictation.fst LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:138) kNontermBigNumber, GetEncodingMultiple: 10000000, 1000 LOG ([5.5-win]:kaldi::nnet3::Nnet::RemoveOrphanNodes():nnet3\nnet-nnet.cc:948) Removed 5 orphan nodes. LOG ([5.5-win]:kaldi::nnet3::Nnet::RemoveOrphanComponents():nnet3\nnet-nnet.cc:847) Removing 11 orphan components. LOG ([5.5-win]:kaldi::nnet3::ModelCollapser::Collapse():nnet3\nnet-utils.cc:1463) Added 6 components, removed 11 LOG ([5.5-win]:kaldi::IvectorExtractor::ComputeDerivedVars():ivector\ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG ([5.5-win]:kaldi::IvectorExtractor::ComputeDerivedVars():ivector\ivector-extractor.cc:204) Done. LOG ([5.5-win]:kaldi::nnet3::CompileLooped():nnet3\nnet-compile-looped.cc:345) Spent 0.0510093 seconds in looped compilation.


Looks like I'm missing some files that aren't in the kaldi_model zip files?

Yeah, I just tried an online file that I was fairly certain had the right format (sampling rate etc.). I've also made my own .wav file, but it responded the same.

daanzu commented 4 years ago

Hmm, those missing files are unimportant. There were no further messages, like an ERROR or stack trace? Maybe try deleting the *.tmp directory. What OS is this on?

mrever commented 4 years ago

Huh, deleting *.tmp seems to have done the trick. Much appreciated!

(moot now, but Windows 10 64-bit to answer your question). Thanks again