MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG]montreal_forced_aligner.exceptions.KaldiProcessingError: KaldiProcessingError: There were 1 job(s) with errors when running Kaldi binaries. #429

Open Leng-bingo opened 2 years ago

Leng-bingo commented 2 years ago

/Users/leng/miniforge3/envs/aligner/bin/gmm-boost-silence --boost=1.0 1 /Users/leng/Documents/MFA/my_data_pretrained_aligner/pretrained_aligner/final.alimdl - /Users/leng/miniforge3/envs/aligner/bin/gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false - scp:/Users/leng/Documents/MFA/my_data_pretrained_aligner/pretrained_aligner/fsts.mandarin_mfa.0.scp 'ark,s,cs:apply-cmvn --utt2spk=ark:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/utt2spk.mandarin_mfa.0.scp scp:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/cmvn.mandarin_mfa.0.scp scp:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/feats.mandarin_mfa.0.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats /Users/leng/Documents/MFA/my_data_pretrained_aligner/pretrained_aligner/lda.mat ark:- ark:- |' ark:/Users/leng/Documents/MFA/my_data_pretrained_aligner/pretrained_aligner/ali.mandarin_mfa.0.ark ark,t:- WARNING (gmm-boost-silence[5.5.992]:main():gmmbin/gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.5.992]:main():gmmbin/gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.5.992]:main():gmmbin/gmm-boost-silence.cc:103) Wrote model to - splice-feats --left-context=3 --right-context=3 ark:- ark:- apply-cmvn --utt2spk=ark:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/utt2spk.mandarin_mfa.0.scp scp:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/cmvn.mandarin_mfa.0.scp scp:/Users/leng/Documents/MFA/my_data_pretrained_aligner/my_data/split1/feats.mandarin_mfa.0.scp ark:- transform-feats /Users/leng/Documents/MFA/my_data_pretrained_aligner/pretrained_aligner/lda.mat ark:- ark:- LOG (apply-cmvn[5.5.992]:main():featbin/apply-cmvn.cc:162) Applied cepstral mean normalization to 1 utterances, errors on 0 LOG (transform-feats[5.5.992]:main():featbin/transform-feats.cc:158) Overall average [pseudo-]logdet is -28.2963 over 5898 frames. LOG (transform-feats[5.5.992]:main():featbin/transform-feats.cc:161) Applied transform to 1 utterances; 0 had errors. LOG (gmm-align-compiled[5.5.992]:main():gmmbin/gmm-align-compiled.cc:127) 0-1 WARNING (gmm-align-compiled[5.5.992]:AlignUtteranceWrapper():decoder/decoder-wrappers.cc:617) Retrying utterance 0-1 with beam 40 WARNING (gmm-align-compiled[5.5.992]:AlignUtteranceWrapper():decoder/decoder-wrappers.cc:626) Did not successfully decode file 0-1, len = 5898 LOG (gmm-align-compiled[5.5.992]:main():gmmbin/gmm-align-compiled.cc:135) Overall log-likelihood per frame is nan over 0 frames. LOG (gmm-align-compiled[5.5.992]:main():gmmbin/gmm-align-compiled.cc:137) Retried 1 out of 1 utterances. LOG (gmm-align-compiled[5.5.992]:main():gmmbin/gmm-align-compiled.cc:139) Done 0, errors on 1

Leng-bingo commented 2 years ago

2022-04-13 14:00:03,137 - my_data_pretrained_aligner - DEBUG - 2022-04-13 14:00:03,137 - my_data_pretrained_aligner - DEBUG - Setup for alignment in 7.447337865829468 seconds 2022-04-13 14:00:03,137 - my_data_pretrained_aligner - INFO - Compiling training graphs... 2022-04-13 14:00:06,021 - my_data_pretrained_aligner - DEBUG - Compiling training graphs took 2.883600950241089 2022-04-13 14:00:06,021 - my_data_pretrained_aligner - INFO - Performing first-pass alignment... 2022-04-13 14:00:06,021 - my_data_pretrained_aligner - INFO - Generating alignments... 2022-04-13 14:00:17,833 - my_data_pretrained_aligner - ERROR - There was an error in the run, please see the log. 2022-04-13 14:04:32,326 - my_data_pretrained_aligner - DEBUG - Beginning run for pretrained_aligner on my_data 2022-04-13 14:04:32,327 - my_data_pretrained_aligner - DEBUG - Using multiprocessing with 3 2022-04-13 14:04:32,327 - my_data_pretrained_aligner - DEBUG - Set up logger for MFA version: 2.0.0rc5 2022-04-13 14:04:32,352 - my_data_pretrained_aligner - DEBUG - Previous run ended in an error (maybe ctrl-c?) 2022-04-13 14:04:32,352 - my_data_pretrained_aligner - DEBUG - Previous run used a different corpus directory than ./my_data (was None) 2022-04-13 14:04:32,352 - my_data_pretrained_aligner - WARNING - The previous run had a different configuration than the current, which may cause issues. Please see the log for details or use --clean flag if issues are encountered. 2022-04-13 14:04:39,734 - my_data_pretrained_aligner - DEBUG - Using IPA 2022-04-13 14:04:39,734 - my_data_pretrained_aligner - DEBUG - Loaded dictionary in 7.381918907165527 2022-04-13 14:04:39,734 - my_data_pretrained_aligner - INFO - Setting up corpus information... 2022-04-13 14:04:39,735 - my_data_pretrained_aligner - DEBUG - Successfully loaded from temporary files 2022-04-13 14:04:39,742 - my_data_pretrained_aligner - INFO - Found 1 speaker across 1 file, average number of utterances per speaker: 1.0 2022-04-13 14:04:39,742 - my_data_pretrained_aligner - DEBUG - Loaded corpus in 0.0074651241302490234 2022-04-13 14:04:39,742 - my_data_pretrained_aligner - DEBUG - Wrote lexicon information in 0.0 2022-04-13 14:04:39,742 - my_data_pretrained_aligner - INFO - Initializing multiprocessing jobs... 2022-04-13 14:04:39,745 - my_data_pretrained_aligner - DEBUG - Initialized jobs in 0.003125905990600586 2022-04-13 14:04:39,745 - my_data_pretrained_aligner - INFO - Creating corpus split with features... 2022-04-13 14:04:39,745 - my_data_pretrained_aligner - DEBUG - Created corpus split directory in 0.00012111663818359375 2022-04-13 14:04:39,745 - my_data_pretrained_aligner - DEBUG - Generated features in 1.1920928955078125e-06 2022-04-13 14:04:39,749 - my_data_pretrained_aligner - DEBUG - Calculated oovs found in 0.0041141510009765625 2022-04-13 14:04:39,750 - my_data_pretrained_aligner - DEBUG - Setting up corpus took 7.397032022476196 seconds 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - ====ACOUSTIC MODEL INFO==== 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - Acoustic model root directory: /Users/leng/Documents/MFA/extracted_models 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - Acoustic model dirname: /Users/leng/Documents/MFA/extracted_models/mandarin_mfa 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - Acoustic model meta path: /Users/leng/Documents/MFA/extracted_models/mandarin_mfa/meta.json 2022-04-13 14:04:39,778 - my_data_pretrained_aligner - DEBUG - Acoustic model meta information: 2022-04-13 14:04:39,784 - my_data_pretrained_aligner - DEBUG - architecture: gmm-hmm features: allow_downsample: true allow_upsample: true delta_pitch: 0.005 feature_type: mfcc frame_length: 25 frame_shift: 10 high_frequency: 7800 low_frequency: 20 max_f0: 500 min_f0: 50 penalty_factor: 0.1 sample_frequency: 16000 snip_edges: true use_energy: false use_pitch: true uses_cmvn: true uses_deltas: false uses_speaker_adaptation: true uses_splices: true uses_voiced: false final_non_silence_correction: 1.58 final_silence_correction: 2.97 initial_silence_probability: 0.325 oov_phone: spn optional_silence_phone: sil phone_set_type: IPA phone_type: triphone phones: !!set a: null ai: null "ai\u02E5\u02E5": null "ai\u02E5\u02E9": null "ai\u02E6": null "ai\u02E7\u02E5": null "ai\u02E8": null "ai\u02E8\u02E9\u02E6": null "ai\u02E9": null aj: null "aj\u02E5\u02E5": null "aj\u02E5\u02E9": null "aj\u02E6": null "aj\u02E7\u02E5": null "aj\u02E8": null "aj\u02E8\u02E9\u02E6": null "aj\u02E9": null au: null "au\u02E5\u02E5": null "au\u02E5\u02E9": null "au\u02E6": null "au\u02E7\u02E5": null "au\u02E8": null "au\u02E8\u02E9\u02E6": null "au\u02E9": null aw: null "aw\u02E5\u02E5": null "aw\u02E5\u02E9": null "aw\u02E6": null "aw\u02E7\u02E5": null "aw\u02E8": null "aw\u02E8\u02E9\u02E6": null "aw\u02E9": null "a\u02E5\u02E5": null "a\u02E5\u02E9": null "a\u02E6": null "a\u02E7\u02E5": null "a\u02E8": null "a\u02E8\u02E9\u02E6": null "a\u02E9": null e: null ei: null "ei\u02E5\u02E5": null "ei\u02E5\u02E9": null "ei\u02E6": null "ei\u02E7\u02E5": null "ei\u02E8": null "ei\u02E8\u02E9\u02E6": null "ei\u02E9": null ej: null "ej\u02E5\u02E5": null "ej\u02E5\u02E9": null "ej\u02E6": null "ej\u02E7\u02E5": null "ej\u02E8": null "ej\u02E8\u02E9\u02E6": null "ej\u02E9": null "e\u02E5\u02E5": null "e\u02E5\u02E9": null "e\u02E6": null "e\u02E7\u02E5": null "e\u02E8": null "e\u02E8\u02E9\u02E6": null "e\u02E9": null f: null i: null "i\u02E5\u02E5": null "i\u02E5\u02E9": null "i\u02E6": null "i\u02E7\u02E5": null "i\u02E8": null "i\u02E8\u02E9\u02E6": null "i\u02E9": null j: null k: null "k\u02B0": null l: null m: null n: null o: null ou: null "ou\u02E5\u02E5": null "ou\u02E5\u02E9": null "ou\u02E6": null "ou\u02E7\u02E5": null "ou\u02E8": null "ou\u02E8\u02E9\u02E6": null "ou\u02E9": null ow: null "ow\u02E5\u02E5": null "ow\u02E5\u02E9": null "ow\u02E6": null "ow\u02E7\u02E5": null "ow\u02E8": null "ow\u02E8\u02E9\u02E6": null "ow\u02E9": null "o\u02E5\u02E5": null "o\u02E5\u02E9": null "o\u02E6": null "o\u02E7\u02E5": null "o\u02E8": null "o\u02E8\u02E9\u02E6": null "o\u02E9": null p: null "p\u02B0": null s: null t: null ts: null "ts\u02B0": null "t\u0255": null "t\u0255\u02B0": null "t\u02B0": null u: null "u\u02E5\u02E5": null "u\u02E5\u02E9": null "u\u02E6": null "u\u02E7\u02E5": null "u\u02E8": null "u\u02E8\u02E9\u02E6": null "u\u02E9": null w: null x: null y: null "y\u02E5\u02E5": null "y\u02E5\u02E9": null "y\u02E6": null "y\u02E7\u02E5": null "y\u02E8": null "y\u02E8\u02E9\u02E6": null "y\u02E9": null z: null "z\u0329\u02E5\u02E5": null "z\u0329\u02E5\u02E9": null "z\u0329\u02E6": null "z\u0329\u02E7\u02E5": null "z\u0329\u02E8": null "z\u0329\u02E8\u02E9\u02E6": null "z\u0329\u02E9": null "\u014B": null "\u014B\u030D\u02E7\u02E5": null "\u0255": null "\u0259": null "\u0259\u02E5\u02E5": null "\u0259\u02E5\u02E9": null "\u0259\u02E6": null "\u0259\u02E7\u02E5": null "\u0259\u02E8": null "\u0259\u02E8\u02E9\u02E6": null "\u0259\u02E9": null "\u0265": null "\u027B": null "\u0282": null "\u0288\u0282": null "\u0288\u0282\u02B0": null "\u0290": null "\u0290\u0329\u02E5\u02E5": null "\u0290\u0329\u02E5\u02E9": null "\u0290\u0329\u02E6": null "\u0290\u0329\u02E7\u02E5": null "\u0290\u0329\u02E8": null "\u0290\u0329\u02E8\u02E9\u02E6": null "\u0290\u0329\u02E9": null "\u0294": null silence_probability: 0.32552111637208514 train_date: '2022-03-31 12:45:11.779829' training: audio_duration: 1885489.076749807 average_log_likelihood: -0.014315643974265829 num_oovs: 201778 num_speakers: 6054 num_utterances: 494037 version: 2.0.0rc4.dev0+gd5230fd.d20220126

2022-04-13 14:04:39,784 - my_data_pretrained_aligner - DEBUG - 2022-04-13 14:04:39,784 - my_data_pretrained_aligner - DEBUG - Setup for alignment in 7.432167053222656 seconds 2022-04-13 14:04:39,784 - my_data_pretrained_aligner - INFO - Compiling training graphs... 2022-04-13 14:04:42,679 - my_data_pretrained_aligner - DEBUG - Compiling training graphs took 2.8949649333953857 2022-04-13 14:04:42,680 - my_data_pretrained_aligner - INFO - Performing first-pass alignment... 2022-04-13 14:04:42,680 - my_data_pretrained_aligner - INFO - Generating alignments... 2022-04-13 14:04:54,503 - my_data_pretrained_aligner - ERROR - There was an error in the run, please see the log.

mmcauliffe commented 2 years ago

Try rerunning the align command with a larger beam size: mfa align .... --beam 100 and it should output something.

EzerbelCN commented 2 years ago

Hey, I have met the same error. Here is my code`s main cotent: image Here is the error: image

Use mandarin_mfa. Not all wav files will meet this error.

I hope I could send one of the shortest wav file to your email.

mmcauliffe commented 2 years ago

Sure, happy to take a look at an example file. Did you try increasing the beam as above? In your case, you should just be able to add beam=100, retry_beam=400 when constructing the PretrainedAligner. I'll have a think on how to handle this a little more gracefully for this use case of small numbers of utterances, since usually its not to bad if a couple of files fail alignment initially. At the very least I'll add a more informative error message, but maybe do some better detection of failures and automatic beam increases.

EzerbelCN commented 2 years ago

This really saved me, maybe it`s time to finish this issue. Thank you , my great Mmcauliffe!

Gnilbren commented 2 weeks ago

tried both fixes you mention above and still get same error