MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 242 forks source link

[BUG] Errors during mfa train in Docker #720

Closed hirodeng closed 8 months ago

hirodeng commented 8 months ago

Debugging checklist

[Yes] Have you updated to latest MFA version? [Yes] Have you tried rerunning the command with the --clean flag?

Describe the issue Steps:

docker pull mmcauliffe/montreal-forced-aligner:v2.2.17
docker run -it -v /mydata:/data mmcauliffe/montreal-forced-aligner:v2.2.17
# inside docker container
mfa train --clean /data/mgb2/segments/test/ /data/dict/arabic_mfa.dict /data/dict/arabic_accoustic_model.zip --output_directory /data/dict/test_aligned/

Then errors reported:

INFO     Initializing training for lda...                                                                           
   0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  [ 0:00:01 < -:--:-- , ? it/s ]
 ERROR    There was an error in the run, please see the log.                                                         
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7fb354a7f250>>
Traceback (most recent call last):
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 98, in history_save_handler
    raise self.exception
  File "/env/bin/mfa", line 8, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/train_acoustic_model.py", line 111, in train_acoustic_model_cli
    trainer.train()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 561, in train
    trainer.train()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/base.py", line 488, in train
    self.initialize_training()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/base.py", line 239, in initialize_training
    self._trainer_initialization()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/lda.py", line 482, in _trainer_initialization
    self.tree_stats()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/triphone.py", line 399, in tree_stats
    run_mp(tree_stats_func, jobs, self.working_log_directory)
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/utils.py", line 871, in run_mp
    parse_logs(log_directory)
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/utils.py", line 453, in parse_logs
    raise KaldiProcessingError(error_logs)
montreal_forced_aligner.exceptions.KaldiProcessingError: KaldiProcessingError:

There were 1 job(s) with errors when running Kaldi binaries.
See the log files below for more information.
/mfa/test/lda/log/lda_est.log
 For more details, please check /mfa/test/test.log

Content of /mfa/test/test.log:

/env/bin/est-lda --dim=40 /mfa/test/lda/lda.mat /mfa/test/lda/lda.1.1.acc 
ERROR (est-lda[5.5.1068]:Cholesky():matrix/tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite.
LOG (est-lda[5.5.1068]:Estimate():transform/lda-estimate.cc:107) Cholesky failed (possibly not +ve definite), so adding 0.040475 to diagonal and trying again.

LOG (est-lda[5.5.1068]:Estimate():transform/lda-estimate.cc:126) Data count is 120
LOG (est-lda[5.5.1068]:Estimate():transform/lda-estimate.cc:127) LDA singular values are  [ 25655.3 20646.7 12835.6 11010.2 7439 5163.03 3698.24 2660.34 1695.92 1482.21 987.001 1.2841e-11 9.75926e-12 7.83435e-12 6.09957e-12 5.26322e-12 4.9461e-12 4.22494e-12 3.99889e-12 3.71522e-12 3.69452e-12 3.57858e-12 3.37653e-12 3.2638e-12 3.22632e-12 3.07809e-12 2.90613e-12 2.83605e-12 2.77009e-12 2.67349e-12 2.55081e-12 2.5062e-12 2.42535e-12 2.38441e-12 2.32139e-12 2.20675e-12 2.17867e-12 2.12434e-12 2.03456e-12 2.00864e-12 1.93027e-12 1.87689e-12 1.82504e-12 1.78811e-12 1.73474e-12 1.68031e-12 1.61226e-12 1.57793e-12 1.5398e-12 1.4827e-12 1.45139e-12 1.4102e-12 1.38606e-12 1.31029e-12 1.24101e-12 1.20399e-12 1.17291e-12 1.16047e-12 1.12687e-12 1.07515e-12 1.05265e-12 9.75626e-13 9.61187e-13 9.41506e-13 8.85534e-13 8.57373e-13 7.91372e-13 7.77624e-13 6.97393e-13 6.93539e-13 6.56935e-13 6.15439e-13 5.78565e-13 5.16057e-13 4.97452e-13 4.47542e-13 4.2046e-13 4.07028e-13 3.94604e-13 3.17867e-13 2.72054e-13 2.34953e-13 2.05599e-13 1.99018e-13 1.45451e-13 1.26407e-13 7.59493e-14 6.29998e-14 3.69194e-14 1.29394e-14 2.76489e-15 ]

LOG (est-lda[5.5.1068]:Estimate():transform/lda-estimate.cc:129) Sum of all singular values is 93273.6
LOG (est-lda[5.5.1068]:Estimate():transform/lda-estimate.cc:130) Sum of selected singular values is 93273.6

I also tried the latest version of docker (mfa version 3.0.0a9.dev0+g8ef1aba.d20231102), but another error happened during lda:

INFO     Initialization complete!                                                         
 INFO     lda - Iteration 1 of 35                                                          
 INFO     Accumulating statistics...                                                       
   0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  [ 0:00:02 < -:--:-- , ? it/s ]
 INFO     lda - Iteration 2 of 35                                                          
 INFO     Re-calculating LDA...                                                            
   0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  [ 0:00:02 < -:--:-- , ? it/s ]
 ERROR    There was an error in the run, please see the log.                               
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f5a7f5c1150>>
Traceback (most recent call last):
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
    raise self.exception
  File "/env/bin/mfa", line 8, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/train_acoustic_model.py", line 133, in train_acoustic_model_cli
    trainer.train()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 523, in train
    trainer.train()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/base.py", line 390, in train
    self.train_iteration()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/lda.py", line 451, in train_iteration
    self.calc_lda_mllt()
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/acoustic_modeling/lda.py", line 431, in calc_lda_mllt
    acoustic_model.transform_means(mat)
RuntimeError: kaldi::KaldiFatalError

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Arabic.
    • How many files/speakers? 1 file, 1 speaker.
    • Are you using lab files or TextGrid files for input? Lab file.
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? Yes, the only one arabic dict.
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

hirodeng commented 8 months ago

Found another issue of the same problem. It seems that too few data in the corpus will cause this problem.