alphacep / vosk-android-demo

Offline speech recognition for Android with Vosk library.
Apache License 2.0
752 stars 202 forks source link

CheckMemoryUsage() error with custom model on android #60

Closed jackx1972 closed 4 years ago

jackx1972 commented 4 years ago

I am getting this error on vosk-android-demo when recognizing a file using an English custom model trained with Kaldi 5.5 mini_librispeech scripts: 2020-05-21 14:08:38.072 10073-10631/org.kaldi.demo I/VoskAPI: RebuildRepository():determinize-lattice-pruned.cc:287) Rebuilding repository. 2020-05-21 14:08:39.795 10073-10631/org.kaldi.demo W/VoskAPI: CheckMemoryUsage():determinize-lattice-pruned.cc:320) Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (23629248,1019712,25366728), after rebuilding, repo size was 19967552, effective beam was 1.57066 vs. requested beam 2

Audio data is 1.4 hours, 16kHz files, each of one word or a short phrase. Dictionary is 105 words. The application is recognition of speech impaired users. I modified the current mini_librispeech run.sh script to use local/chain/run_tdnn.sh instead of local/chain2 to create ivectors. I changed the ivector dimension from 100 to 30. Then I added Utils/mkgraph_lookahead.sh to create the Gr.fst and HCLr.fst: utils/mkgraph_lookahead.sh --self-loop-scale 1.0 --remove-oov --compose-graph data/lang_test_tgsmall exp/chain_online_cmn/tree_sp exp/chain_online_cmn/tree_sp/graph_lookahead The results in Kaldi are excellent – tri3b with 4.8% WER and chain_online_cmn of 2.4% WER. As a test I also did lookahead decoding in Kaldi, with results 2.4% WER: steps/nnet3/decode_lookahead.sh --nj 1 \ --acwt 1.0 --post-decode-acwt 10.0 \ --online-ivector-dir exp/nnet3_online_cmn/ivectors_dev_clean_2_hires \ exp/chain_online_cmn/tree_sp/graph_lookahead \ data/dev_clean_2_hires \ exp/chain_online_cmn/tdnn1k_sp/look_decode_tgsmall_dev_clean_2

I moved the custom model files over to kaldi-android-demo \models\src\main\assets\sync\model-android folder. Here are the origin of the 13 files: exp/chain_online_cmn/tree_sp/graph_lookahead/disambig_tid.int exp/chain_online_cmn/tdnn1k_sp/final.mdl exp/chain_online_cmn/tree_sp/graph_lookahead/Gr.fst exp/chain_online_cmn/tree_sp/graph_lookahead/HCLr.fst exp/conf/mfcc_hires.conf is re-named mfcc.conf exp/chain_online_cmn/tree_sp/graph_lookahead/phones/word_boundary.int data/lang/words.txt for the ivector folder files: conf/online_cmvn.conf exp/nnet3_online_cmn/ivectors_train_clean_5_sp_hires/conf/splice.conf exp/nnet3_online_cmn/extractor/final.dubm exp/nnet3_online_cmn/extractor/final.ie exp/nnet3_online_cmn/extractor/final.mat exp/nnet3_online_cmn/extractor/global_cmvn.stats

Using audio files that were used as test files in Kaldi, I get the above checkmemory error on about half of the files. There is no correct recognition on any of the files.
How can I troubleshoot this issue? Thanks for your help.

nshmyrev commented 4 years ago

You probably have setup with online cmvn from mini_librispeech, you'd better train the model without online cmvn. You can set online_cmvn to false in local/chain/run_tdnn.sh and retrain tdnn.

jackx1972 commented 4 years ago

Nickolay, Thank you for your quick response. I set online_cmvn to false and retrained tdnn. The Android CheckMemoryUsage() error is fixed! However, still no correct recognition on any of the files.

nshmyrev commented 4 years ago

However, still no correct recognition on any of the files.

Which files exactly? You'd better try to desktop first with vosk python package, then move to android. You can share the model and the test files to get help.

jackx1972 commented 4 years ago

I installed the python version on a x86 machine running Debian Linux. I moved my model to vosk-api/python/example/model. Using test_simple.py I get much better results: using my voice files from the dev_clean_2 folder that was used in the test decoding in the Kaldi training I get 67% WER. Then I ran test_words.py and changed the word list to the 42 words used in my voice files. No recognition – just like results in Android.
So in Android I changed the KaldiRecognizer statement to remove the word list (grammar) : rec = new KaldiRecognizer(activityReference.get().model, 16000.f); Now I get nearly the same results on Android as with test_simple.py – a little better, 64% WER. I don’t need word list support, but I am curious why results are so bad using it. Also, the 64% is still much worse than the 2.4% on Kaldi.

Attached is my model and some of the voice files. The transcription of each file is in its name. all.wav, yellow.wav, baby.wav all decode correctly with test_simple.py and the modified Android demo app. Good_night.wav, water.wav, shirt.wav do not decode correctly with test_simple.py and the modified Android demo app. voice_files.zip https://drive.google.com/drive/folders/1I61swBjr6FVvxzyMzHnTYdfong7DppyG?usp=sharing

nshmyrev commented 4 years ago

@jackx1972 in the model words.txt is wrong, you probably took it from lang folder, you should take reordered version from lgraph folder where the HCLr.fst is compled.

jackx1972 commented 4 years ago

I checked and the words.txt is the same in lgraph folder, not reordered. I realized that I was not including the arpa.txt path in the arguments for mkgraph_lookahead.sh, so added that and re-ran it. It then created the reordered words.txt in lgraph folder. Updated the model, including new Gr.fst.
Using test_simple.py I get these results: using my voice files from the dev_clean_2 folder that was used in the test decoding in the Kaldi training I get the same 67% WER. Then I ran test_words.py and changed the word list to the 42 words used in my voice files. Now get improved results: 45% WER. Same results in comparative setups in Android. It is still worse than the 2.4% on Kaldi.

Updated model is at https://drive.google.com/file/d/1jmfZ2vjgIg2g2jyEkPUrX2il6MI5nkGY/view?usp=sharing List of the 42 words used for word list is attached. wordlist.txt

nshmyrev commented 4 years ago

@jackx1972 hm, still a mismatch it should be. Can you please try to train the DNN model with mini_librispeech/s5/local/chain/tuning/run_tdnn_1j.sh instead of latest run_tdnn_1k.sh?

jackx1972 commented 4 years ago

Nickolay,
I trained with run_tdnn_1j.sh. Results now are excellent – 4.8% WER with python test_simple.py, same on Android. That matches the Kaldi decode_lookahead.sh results. Thank you for all your help!

nshmyrev commented 4 years ago

Ok, good. Basically the same issue as in https://github.com/alphacep/vosk-api/issues/77