alphacep / vosk-android-demo

Offline speech recognition for Android with Vosk library.
Apache License 2.0
736 stars 196 forks source link

Training android model at higher sample frequencies #35

Closed FBorgeat closed 4 years ago

FBorgeat commented 4 years ago

I was following this tutorial: https://kaldi-asr.org/doc/kaldi_for_dummies.html with my own recorded files to make an android model at 22050 but there are issues with the provided code and i'm not sure the output model files would work for kaldi-android, are there English models already made for that sampling frequency or is there a tutorial that fits better?

nshmyrev commented 4 years ago

but there are issues with the provided code

What issues exactly?

FBorgeat commented 4 years ago

image I get this error at the mono training part.

nshmyrev commented 4 years ago

There was earlier error. You'd better paste the whole log as a text, not as an image.

FBorgeat commented 4 years ago

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono filter_scps.pl: warning: some input lines did not get output steps/train_mono.sh: Initializing monophone system. feat-to-dim 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |' - add-deltas ark:- ark:- apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- WARNING (apply-cmvn[5.5]:Open():util/kaldi-table-inl.h:106) Failed to open script file data/train/split1/1/feats.scp ERROR (apply-cmvn[5.5]:SequentialTableReader():util/kaldi-table-inl.h:860) Error constructing TableReader: rspecifier is scp:data/train/split1/1/feats.scp

nshmyrev commented 4 years ago

Above. Once again - the whole log.

FBorgeat commented 4 years ago

(base) tamere@pop-os:~/extended/kaldi/egs/tidigits$ ./run.sh

===== PREPARING ACOUSTIC DATA =====

===== FEATURES EXTRACTION =====

steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/test steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. run.pl: job failed, log is in exp/make_mfcc/test/make_mfcc_test.1.log steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc steps/compute_cmvn_stats.sh: no such file data/train/feats.scp steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc steps/compute_cmvn_stats.sh: no such file data/test/feats.scp

===== PREPARING LANGUAGE DATA =====

utils/prepare_lang.sh data/local/dict data/local/lang data/lang Checking data/local/dict/silence_phones.txt ... --> reading data/local/dict/silence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ... --> reading data/local/dict/optional_silence.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ... --> reading data/local/dict/nonsilence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt --> disjoint property is OK.

Checking data/local/dict/lexicon.txt --> reading data/local/dict/lexicon.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ... --> data/local/dict/extra_questions.txt is empty (this is OK) --> SUCCESS [validating dictionary directory data/local/dict]

**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt utils/prepare_lang.sh: line 547: fstaddselfloops: command not found ERROR: FstHeader::Read: Bad FST header: standard input

===== LANGUAGE MODEL CREATION ===== ===== MAKING lm.arpa =====

=== 1/5 Counting and sorting n-grams === File stdin isn't normal. Using slower read() instead of mmap(). No progress bar. === 2/5 Calculating and sorting adjusted counts === Chain sizes: 1:156 Statistics: 1 13 D1=0.428571 D2=0.714286 D3+=1.28571 Memory estimate for binary LM: type B probing 576 assuming -p 1.5 probing 632 assuming -r models -p 1.5 trie 422 without quantization trie 18446744073709550981 assuming -q 8 -b 8 quantization trie 422 assuming -a 22 array pointer compression trie 18446744073709550981 assuming -a 22 -q 8 -b 8 array pointer compression and quantization === 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:156 === 4/5 Calculating and writing order-interpolated probabilities === Chain sizes: 1:156 Chain sizes: 1:156 === 5/5 Writing ARPA model === Name:lmplz VmPeak:8259684 kB VmRSS:5432 kB RSSMax:0 kB user:0 sys:0 CPU:0 real:3.96405

===== MAKING G.fst =====

arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt /tmp/lm.arpa --discount_fallback data/lang/G.fst

Convert an ARPA format language model into an FST Usage: arpa2fst [opts] e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst

Note: When called without switches, the output G.fst will contain an embedded symbol table. This is compatible with the way a previous version of arpa2fst worked.

Options: --bos-symbol : Beginning of sentence symbol (string, default = "") --disambig-symbol : Disambiguator. If provided (e. g. #0), used on input side of backoff links, and and are replaced with epsilons (string, default = "") --eos-symbol : End of sentence symbol (string, default = "") --ilabel-sort : Ilabel-sort the output FST (bool, default = true) --keep-symbols : Store symbol table with FST. Symbols always saved to FST if symbol tables are neither read or written (otherwise symbols would be lost entirely) (bool, default = false) --max-arpa-warnings : Maximum warnings to report on ARPA parsing, 0 to disable, -1 to show all (int, default = 30) --read-symbol-table : Use existing symbol table (string, default = "") --write-symbol-table : Write generated symbol table to a file (string, default = "")

Standard options: --config : Configuration file to read (this option may be repeated) (string, default = "") --help : Print out usage message (bool, default = false) --print-args : Print the command line arguments (to stderr) (bool, default = true) --verbose : Verbose level (higher->more logging) (int, default = 0)

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono filter_scps.pl: warning: some input lines did not get output steps/train_mono.sh: Initializing monophone system. feat-to-dim 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |' - add-deltas ark:- ark:- apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- WARNING (apply-cmvn[5.5]:Open():util/kaldi-table-inl.h:106) Failed to open script file data/train/split1/1/feats.scp ERROR (apply-cmvn[5.5]:SequentialTableReader():util/kaldi-table-inl.h:860) Error constructing TableReader: rspecifier is scp:data/train/split1/1/feats.scp

[ Stack-Trace: ] apply-cmvn(kaldi::MessageLogger::LogMessage() const+0x77b) [0x5621320326dd] apply-cmvn(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x562131f9ed01] apply-cmvn(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix > >::SequentialTableReader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0xc4) [0x562131fadf0c] apply-cmvn(main+0x7b7) [0x562131f9d2a0] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f8054ac61e3] apply-cmvn(_start+0x2e) [0x562131f9ca2e]

kaldi::KaldiFatalErrorERROR (feat-to-dim[5.5]:main():feat-to-dim.cc:58) Could not read any features (empty archive?)

[ Stack-Trace: ] feat-to-dim(kaldi::MessageLogger::LogMessage() const+0x77b) [0x55fb81fd12ab] feat-to-dim(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55fb81f5141d] feat-to-dim(main+0x2f7) [0x55fb81f50d80] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fbcc8f291e3] feat-to-dim(_start+0x2e) [0x55fb81f509ce]

kaldi::KaldiFatalErrorerror getting feature dimension

nshmyrev commented 4 years ago

The first errors say:

utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates run.pl: job failed, log is in exp/make_mfcc/test/make_mfcc_test.1.log you need to validate train folder before running and also read that test log file for details.

I see you are running tidigits example, it is not the kaldi for dummies tutorial. If you need help please provide complete information on what you have done.

FBorgeat commented 4 years ago

I'm running tidigits because there is no digits folder like in the tutorial, I followed every step described and replaced SRILM with KenLm but i will go through it again, get utt2spk sorted and get back to you if that doesn't solve it.

nshmyrev commented 4 years ago

Ok, let me know

You need to add

utils/fix_data_dir.sh data/train
utils/fix_data_dir.sh data/train

before processing in order to properly sort data.

FBorgeat commented 4 years ago

(base) tamere@pop-os:~/extended/kaldi/egs/tidigits$ ./run.sh

===== PREPARING ACOUSTIC DATA =====

===== FEATURES EXTRACTION =====

utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates utils/fix_data_dir.sh: file data/train/utt2spk is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/text is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/wav.scp is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 9 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/train steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. run.pl: job failed, log is in exp/make_mfcc/train/make_mfcc_train.1.log steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/test steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. run.pl: job failed, log is in exp/make_mfcc/test/make_mfcc_test.1.log steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc steps/compute_cmvn_stats.sh: no such file data/train/feats.scp steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc steps/compute_cmvn_stats.sh: no such file data/test/feats.scp

===== PREPARING LANGUAGE DATA =====

utils/prepare_lang.sh data/local/dict data/local/lang data/lang Checking data/local/dict/silence_phones.txt ... --> reading data/local/dict/silence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ... --> reading data/local/dict/optional_silence.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ... --> reading data/local/dict/nonsilence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt --> disjoint property is OK.

Checking data/local/dict/lexicon.txt --> reading data/local/dict/lexicon.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ... --> data/local/dict/extra_questions.txt is empty (this is OK) --> SUCCESS [validating dictionary directory data/local/dict]

**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt utils/prepare_lang.sh: line 547: fstaddselfloops: command not found ERROR: FstHeader::Read: Bad FST header: standard input

===== LANGUAGE MODEL CREATION ===== ===== MAKING lm.arpa =====

=== 1/5 Counting and sorting n-grams === File stdin isn't normal. Using slower read() instead of mmap(). No progress bar. === 2/5 Calculating and sorting adjusted counts === Chain sizes: 1:156 Statistics: 1 13 D1=0.428571 D2=0.714286 D3+=1.28571 Memory estimate for binary LM: type B probing 576 assuming -p 1.5 probing 632 assuming -r models -p 1.5 trie 422 without quantization trie 18446744073709550981 assuming -q 8 -b 8 quantization trie 422 assuming -a 22 array pointer compression trie 18446744073709550981 assuming -a 22 -q 8 -b 8 array pointer compression and quantization === 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:156 === 4/5 Calculating and writing order-interpolated probabilities === Chain sizes: 1:156 Chain sizes: 1:156 === 5/5 Writing ARPA model === Name:lmplz VmPeak:8243292 kB VmRSS:5408 kB RSSMax:0 kB user:0 sys:0 CPU:0 real:20.4697

===== MAKING G.fst =====

arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt /tmp/lm.arpa --discount_fallback data/lang/G.fst

Convert an ARPA format language model into an FST Usage: arpa2fst [opts] e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst

Note: When called without switches, the output G.fst will contain an embedded symbol table. This is compatible with the way a previous version of arpa2fst worked.

Options: --bos-symbol : Beginning of sentence symbol (string, default = "") --disambig-symbol : Disambiguator. If provided (e. g. #0), used on input side of backoff links, and and are replaced with epsilons (string, default = "") --eos-symbol : End of sentence symbol (string, default = "") --ilabel-sort : Ilabel-sort the output FST (bool, default = true) --keep-symbols : Store symbol table with FST. Symbols always saved to FST if symbol tables are neither read or written (otherwise symbols would be lost entirely) (bool, default = false) --max-arpa-warnings : Maximum warnings to report on ARPA parsing, 0 to disable, -1 to show all (int, default = 30) --read-symbol-table : Use existing symbol table (string, default = "") --write-symbol-table : Write generated symbol table to a file (string, default = "")

Standard options: --config : Configuration file to read (this option may be repeated) (string, default = "") --help : Print out usage message (bool, default = false) --print-args : Print the command line arguments (to stderr) (bool, default = true) --verbose : Verbose level (higher->more logging) (int, default = 0)

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono steps/train_mono.sh: Initializing monophone system. feat-to-dim 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |' - apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- WARNING (apply-cmvn[5.5]:Open():util/kaldi-table-inl.h:106) Failed to open script file data/train/split1/1/feats.scp add-deltas ark:- ark:- ERROR (apply-cmvn[5.5]:SequentialTableReader():util/kaldi-table-inl.h:860) Error constructing TableReader: rspecifier is scp:data/train/split1/1/feats.scp

[ Stack-Trace: ] apply-cmvn(kaldi::MessageLogger::LogMessage() const+0x77b) [0x56143677d6dd] apply-cmvn(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5614366e9d01] apply-cmvn(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix > >::SequentialTableReader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0xc4) [0x5614366f8f0c] apply-cmvn(main+0x7b7) [0x5614366e82a0] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb5b0c691e3] apply-cmvn(_start+0x2e) [0x5614366e7a2e]

kaldi::KaldiFatalErrorERROR (feat-to-dim[5.5]:main():feat-to-dim.cc:58) Could not read any features (empty archive?)

[ Stack-Trace: ] feat-to-dim(kaldi::MessageLogger::LogMessage() const+0x77b) [0x557130ca22ab] feat-to-dim(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x557130c2241d] feat-to-dim(main+0x2f7) [0x557130c21d80] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f4aa0f6a1e3] feat-to-dim(_start+0x2e) [0x557130c219ce]

kaldi::KaldiFatalErrorerror getting feature dimension

nshmyrev commented 4 years ago

Read

run.pl: job failed, log is in exp/make_mfcc/train/make_mfcc_train.1.log

FBorgeat commented 4 years ago

My bad, there was a path error in my file, after fixing though, in my log i get this: ERROR ... WaveDAta: unsupported bits_per_sample = 24 It's not mentioned in the tutorial but i figure I'll remake my audio files with 12 bits per sample.

nshmyrev commented 4 years ago

It should be 16 bits per sample.

FBorgeat commented 4 years ago

utils/prepare_lang.sh: line 547: fstaddselfloops: command not found ERROR: FstHeader::Read: Bad FST header: standard input

===== LANGUAGE MODEL CREATION ===== ===== MAKING lm.arpa =====

=== 1/5 Counting and sorting n-grams === File stdin isn't normal. Using slower read() instead of mmap(). No progress bar. === 2/5 Calculating and sorting adjusted counts === Chain sizes: 1:156 Statistics: 1 13 D1=0.428571 D2=0.714286 D3+=1.28571 Memory estimate for binary LM: type B probing 576 assuming -p 1.5 probing 632 assuming -r models -p 1.5 trie 422 without quantization trie 18446744073709550981 assuming -q 8 -b 8 quantization trie 422 assuming -a 22 array pointer compression trie 18446744073709550981 assuming -a 22 -q 8 -b 8 array pointer compression and quantization === 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:156 === 4/5 Calculating and writing order-interpolated probabilities === Chain sizes: 1:156 Chain sizes: 1:156 === 5/5 Writing ARPA model === Name:lmplz VmPeak:8259684 kB VmRSS:5380 kB RSSMax:0 kB user:0 sys:0 CPU:0 real:2.69141

===== MAKING G.fst =====

arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt /tmp/lm.arpa --discount_fallback data/lang/G.fst

Convert an ARPA format language model into an FST Usage: arpa2fst [opts] e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst

Note: When called without switches, the output G.fst will contain an embedded symbol table. This is compatible with the way a previous version of arpa2fst worked.

Options: --bos-symbol : Beginning of sentence symbol (string, default = "") --disambig-symbol : Disambiguator. If provided (e. g. #0), used on input side of backoff links, and and are replaced with epsilons (string, default = "") --eos-symbol : End of sentence symbol (string, default = "") --ilabel-sort : Ilabel-sort the output FST (bool, default = true) --keep-symbols : Store symbol table with FST. Symbols always saved to FST if symbol tables are neither read or written (otherwise symbols would be lost entirely) (bool, default = false) --max-arpa-warnings : Maximum warnings to report on ARPA parsing, 0 to disable, -1 to show all (int, default = 30) --read-symbol-table : Use existing symbol table (string, default = "") --write-symbol-table : Write generated symbol table to a file (string, default = "")

Standard options: --config : Configuration file to read (this option may be repeated) (string, default = "") --help : Print out usage message (bool, default = false) --print-args : Print the command line arguments (to stderr) (bool, default = true) --verbose : Verbose level (higher->more logging) (int, default = 0)

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono steps/train_mono.sh: Initializing monophone system. run.pl: job failed, log is in exp/mono/log/init.log init.log:

gmm-init-mono --shared-phones=data/lang/phones/sets.int "--train-feats=ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split1/1/utt2spk scp:data/train/split1/1/cmvn.scp scp:data/train/split1/1/feats.scp ark:- | add-deltas ark:- ark:- | subset-feats --n=10 ark:- ark:-|" data/lang/topo 39 exp/mono/0.mdl exp/mono/tree

Started at Tue Mar 10 17:16:00 EDT 2020

# bash: line 1: gmm-init-mono: command not found

Accounting: time=0 threads=1

Ended (code 127) at Tue Mar 10 17:16:00 EDT 2020, elapsed time 0 seconds

nshmyrev commented 4 years ago

Kaldi is not compiled or path is wrong. Check path.sh