Closed mrever closed 4 years ago
Strange, it seems to work fine for me (although that wav doesn't sound like English). Output below:
No handlers could be found for logger "kaldi.model"
(<type 'str'>, 46002)
("u'like eighty percent'", 1.0372425317764282)
Try running it with full debugging output by putting at the top:
import logging
logging.basicConfig(level=1)
Here's the logging output:
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\words.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\phones.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\align_lexicon.int'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\disambig.int'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\L_disambig.fst'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\tree'
DEBUG:kaldi:kaldi_active_grammar: find_file cannot find required file '1.mdl' in 'kaldi_model\' (or subdirectories)
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\final.mdl'
DEBUG:kaldi:kaldi_active_grammar: find_file cannot find required file 'g.irelabel' in 'kaldi_model\' (or subdirectories)
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\user_lexicon.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\left_context_phones.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\nonterminals.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\wdisambig_phones.int'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\wdisambig_words.int'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\lexiconp_disambig.txt'
DEBUG:kaldi.model:loading words from 'kaldi_model\words.txt'
DEBUG:kaldi.compiler:KaldiRule(-1, top): Skipped full compilation thanks to FileCache
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\words.txt'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\align_lexicon.int'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\conf\mfcc_hires.conf'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\ivectors_test_hires\conf\ivector_extractor.conf'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\conf\online_cmvn.conf'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\ivectors_test_hires\conf\splice.conf'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.mat'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\global_cmvn.stats'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.dubm'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\extractor\final.ie'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\final.mdl'
DEBUG:kaldi:kaldi_active_grammar: find_file found file 'kaldi_model\phones.txt'
LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:128) nonterm_phones_offset: 993 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:129) rules_nonterm_offset: 7 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:130) dictation_nonterm_offset: 5 LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:131) word_syms_filename: kaldi_model\words.txt LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:132) word_align_lexicon_filename: kaldi_model\align_lexicon.int LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:133) mfcc_config_filename: kaldi_model\conf\mfcc_hires.conf LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:134) ie_config_filename: kaldi_model.tmp\ivector_extractor.conf LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:135) model_filename: kaldi_model\final.mdl LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:136) top_fst_filename: kaldi_model.tmp\6c29c7b4d8970b63ad320d7cac5296a119b6ab3f.fst LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:137) dictation_fst_filename: kaldi_model\Dictation.fst LOG ([5.5-win]:dragonfly::AgfNNet3OnlineModelWrapper::AgfNNet3OnlineModelWrapper():dragonfly\agf-nnet3.cpp:138) kNontermBigNumber, GetEncodingMultiple: 10000000, 1000 LOG ([5.5-win]:kaldi::nnet3::Nnet::RemoveOrphanNodes():nnet3\nnet-nnet.cc:948) Removed 5 orphan nodes. LOG ([5.5-win]:kaldi::nnet3::Nnet::RemoveOrphanComponents():nnet3\nnet-nnet.cc:847) Removing 11 orphan components. LOG ([5.5-win]:kaldi::nnet3::ModelCollapser::Collapse():nnet3\nnet-utils.cc:1463) Added 6 components, removed 11 LOG ([5.5-win]:kaldi::IvectorExtractor::ComputeDerivedVars():ivector\ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG ([5.5-win]:kaldi::IvectorExtractor::ComputeDerivedVars():ivector\ivector-extractor.cc:204) Done. LOG ([5.5-win]:kaldi::nnet3::CompileLooped():nnet3\nnet-compile-looped.cc:345) Spent 0.0510093 seconds in looped compilation.
Looks like I'm missing some files that aren't in the kaldi_model zip files?
Yeah, I just tried an online file that I was fairly certain had the right format (sampling rate etc.). I've also made my own .wav file, but it responded the same.
Hmm, those missing files are unimportant. There were no further messages, like an ERROR or stack trace? Maybe try deleting the *.tmp
directory. What OS is this on?
Huh, deleting *.tmp seems to have done the trick. Much appreciated!
(moot now, but Windows 10 64-bit to answer your question). Thanks again
python crashes when running this sample code:
########## import sys, wave from kaldi_active_grammar import PlainDictationRecognizer recognizer = PlainDictationRecognizer() wfi = 'test.wav' ### load a 16 kHz 16-bit example wav file from kaldi repository wave_file = wave.open(wfi, 'rb') data = wave_file.readframes(wave_file.getnframes()) print(type(data), len(data)) #bytes object with in16 audio data output_str, likelihood = recognizer.decode_utterance(data) print('won\'t get here, decode_utterance crashes python') print(repr(output_str), likelihood)
###########
type(data), len(data) = (<class 'bytes'>, 46002)
test.wav is from here: https://github.com/kaldi-asr/kaldi/tree/master/src/feat/test_data/test.wav
kaldi_model is from here: https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.2.0/kaldi_model_zamia.zip
kaldi_active_grammar.version = '1.2.0' sys.version = '3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]'
Any idea what might be going wrong? Is there a sample wav file that I should try?
Thanks