MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 242 forks source link

[BUG] KaldiProcessingError cannot output anything despite some files succeeded #700

Open slrlab-tech opened 10 months ago

slrlab-tech commented 10 months ago

Debugging checklist

[ Y ] Have you updated to latest MFA version? tried on 3.0.0a and stable 2.2.17 [ Y ] Have you tried rerunning the command with the --clean flag?

Describe the issue The full corpus contains ~100 files, and 80% of the alignment was successful.

  1. The 20% unsuccessful files was not warned anywhere in the cmd, need to manually locate them.

Tried to isolate the issue of some files, then KaldiProcessingError happened.

  1. Single file would just output KaldiProcessingError, nothing in output folder
  2. Multiple files also didn't work, tried with 7 files where 1 was unsuccessful, and the error occured, none of the previously successful files were outputed

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Cantonese
    • How many files/speakers? ~100 files
    • Are you using lab files or TextGrid files for input? .wav with .txt files
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? no
    • If it's a custom dictionary, what is the phoneset? Jyutping
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? no, it was found online from previous studies
    • If it's a model you've trained, what data was it trained on? no

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Traceback (most recent call last):

  File "C:\Users\xxx\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\abc.py", line 92, in run
    yield from self._run()

  File "C:\Users\xxx\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\alignment\multiprocessing.py", line 1003, in _run
    self.check_call(align_proc)

  File "C:\Users\xxx\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\abc.py", line 119, in check_call
    raise KaldiProcessingError([self.log_path])
LOG (gmm-boost-silence.EXE[5.5.1068]:main():gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence.EXE[5.5.1068]:main():gmmbin\gmm-boost-silence.cc:103) Wrote model to -
add-deltas 'scp,s,cs:C:\Users\xxx\Documents\MFA\tempFolder\tempFolder\split8\feats.1.1.scp' ark:- 
LOG (gmm-align-compiled.EXE[5.5.1068]:main():gmmbin\gmm-align-compiled.cc:127) 1-1
WARNING (gmm-align-compiled.EXE[5.5.1068]:kaldi::AlignUtteranceWrapper():decoder\decoder-wrappers.cc:617) Retrying utterance 1-1 with beam 40
WARNING (gmm-align-compiled.EXE[5.5.1068]:kaldi::AlignUtteranceWrapper():decoder\decoder-wrappers.cc:626) Did not successfully decode file 1-1, len = 11423
LOG (gmm-align-compiled.EXE[5.5.1068]:main():gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames.
LOG (gmm-align-compiled.EXE[5.5.1068]:main():gmmbin\gmm-align-compiled.cc:137) Retried 1 out of 1 utterances.
LOG (gmm-align-compiled.EXE[5.5.1068]:main():gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 1

Desktop (please complete the following information):

Additional context After changing the --beam 1000 --retry-beam 4000, all files was aligned, but the runtime was much longer. IDK why, but the --finetune flag couldn't output any textgrid too, even tho the first and second pass alignment were successful