Closed jackx1972 closed 4 years ago
You probably have setup with online cmvn from mini_librispeech, you'd better train the model without online cmvn. You can set online_cmvn to false in local/chain/run_tdnn.sh and retrain tdnn.
Nickolay, Thank you for your quick response. I set online_cmvn to false and retrained tdnn. The Android CheckMemoryUsage() error is fixed! However, still no correct recognition on any of the files.
However, still no correct recognition on any of the files.
Which files exactly? You'd better try to desktop first with vosk python package, then move to android. You can share the model and the test files to get help.
I installed the python version on a x86 machine running Debian Linux. I moved my model to vosk-api/python/example/model. Using test_simple.py I get much better results: using my voice files from the dev_clean_2 folder that was used in the test decoding in the Kaldi training I get 67% WER. Then I ran test_words.py and changed the word list to the 42 words used in my voice files. No recognition – just like results in Android.
So in Android I changed the KaldiRecognizer statement to remove the word list (grammar) :
rec = new KaldiRecognizer(activityReference.get().model, 16000.f);
Now I get nearly the same results on Android as with test_simple.py – a little better, 64% WER.
I don’t need word list support, but I am curious why results are so bad using it. Also, the 64% is still much worse than the 2.4% on Kaldi.
Attached is my model and some of the voice files. The transcription of each file is in its name. all.wav, yellow.wav, baby.wav all decode correctly with test_simple.py and the modified Android demo app. Good_night.wav, water.wav, shirt.wav do not decode correctly with test_simple.py and the modified Android demo app. voice_files.zip https://drive.google.com/drive/folders/1I61swBjr6FVvxzyMzHnTYdfong7DppyG?usp=sharing
@jackx1972 in the model words.txt
is wrong, you probably took it from lang folder, you should take reordered version from lgraph folder where the HCLr.fst
is compled.
I checked and the words.txt is the same in lgraph folder, not reordered. I realized that I was not including the arpa.txt path in the arguments for mkgraph_lookahead.sh, so added that and re-ran it. It then created the reordered words.txt in lgraph folder. Updated the model, including new Gr.fst.
Using test_simple.py I get these results: using my voice files from the dev_clean_2 folder that was used in the test decoding in the Kaldi training I get the same 67% WER. Then I ran test_words.py and changed the word list to the 42 words used in my voice files. Now get improved results: 45% WER. Same results in comparative setups in Android. It is still worse than the 2.4% on Kaldi.
Updated model is at https://drive.google.com/file/d/1jmfZ2vjgIg2g2jyEkPUrX2il6MI5nkGY/view?usp=sharing List of the 42 words used for word list is attached. wordlist.txt
@jackx1972 hm, still a mismatch it should be. Can you please try to train the DNN model with mini_librispeech/s5/local/chain/tuning/run_tdnn_1j.sh
instead of latest run_tdnn_1k.sh
?
Nickolay,
I trained with run_tdnn_1j.sh. Results now are excellent – 4.8% WER with python test_simple.py, same on Android. That matches the Kaldi decode_lookahead.sh results. Thank you for all your help!
Ok, good. Basically the same issue as in https://github.com/alphacep/vosk-api/issues/77
I am getting this error on vosk-android-demo when recognizing a file using an English custom model trained with Kaldi 5.5 mini_librispeech scripts: 2020-05-21 14:08:38.072 10073-10631/org.kaldi.demo I/VoskAPI: RebuildRepository():determinize-lattice-pruned.cc:287) Rebuilding repository. 2020-05-21 14:08:39.795 10073-10631/org.kaldi.demo W/VoskAPI: CheckMemoryUsage():determinize-lattice-pruned.cc:320) Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (23629248,1019712,25366728), after rebuilding, repo size was 19967552, effective beam was 1.57066 vs. requested beam 2
Audio data is 1.4 hours, 16kHz files, each of one word or a short phrase. Dictionary is 105 words. The application is recognition of speech impaired users. I modified the current mini_librispeech run.sh script to use local/chain/run_tdnn.sh instead of local/chain2 to create ivectors. I changed the ivector dimension from 100 to 30. Then I added Utils/mkgraph_lookahead.sh to create the Gr.fst and HCLr.fst: utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --remove-oov --compose-graph data/lang_test_tgsmall exp/chain_online_cmn/tree_sp exp/chain_online_cmn/tree_sp/graph_lookahead The results in Kaldi are excellent – tri3b with 4.8% WER and chain_online_cmn of 2.4% WER. As a test I also did lookahead decoding in Kaldi, with results 2.4% WER: steps/nnet3/decode_lookahead.sh --nj 1 \ --acwt 1.0 --post-decode-acwt 10.0 \ --online-ivector-dir exp/nnet3_online_cmn/ivectors_dev_clean_2_hires \ exp/chain_online_cmn/tree_sp/graph_lookahead \ data/dev_clean_2_hires \ exp/chain_online_cmn/tdnn1k_sp/look_decode_tgsmall_dev_clean_2
I moved the custom model files over to kaldi-android-demo \models\src\main\assets\sync\model-android folder. Here are the origin of the 13 files: exp/chain_online_cmn/tree_sp/graph_lookahead/disambig_tid.int exp/chain_online_cmn/tdnn1k_sp/final.mdl exp/chain_online_cmn/tree_sp/graph_lookahead/Gr.fst exp/chain_online_cmn/tree_sp/graph_lookahead/HCLr.fst exp/conf/mfcc_hires.conf is re-named mfcc.conf exp/chain_online_cmn/tree_sp/graph_lookahead/phones/word_boundary.int data/lang/words.txt for the ivector folder files: conf/online_cmvn.conf exp/nnet3_online_cmn/ivectors_train_clean_5_sp_hires/conf/splice.conf exp/nnet3_online_cmn/extractor/final.dubm exp/nnet3_online_cmn/extractor/final.ie exp/nnet3_online_cmn/extractor/final.mat exp/nnet3_online_cmn/extractor/global_cmvn.stats
Using audio files that were used as test files in Kaldi, I get the above checkmemory error on about half of the files. There is no correct recognition on any of the files.
How can I troubleshoot this issue? Thanks for your help.