MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 248 forks source link

[BUG] bad alignment or failure with beam options #671

Open cdicanio opened 1 year ago

cdicanio commented 1 year ago

Debugging checklist

[ ] Have you updated to latest MFA version?

How do I check the current version I have?

[ ] Have you tried rerunning the command with the --clean flag?

Yes.

Describe the issue Not sure if it is a bug, but my alignment seems to ignore all pauses, so I am getting really long estimates for initial segments in each aligned portion. It runs if I do not set anything different for "beam." When I attempt to set --beam 1000 (after the directories, I think, but it's actually not clear in the tutorial that you put this before (with options) or after), I get an error.

==== ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x103466b90>> Traceback (most recent call last): File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 102, in history_save_handler raise self.exception File "/Users/cdicanio/miniconda3/envs/aligner/bin/mfa", line 10, in sys.exit(mfa_cli()) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1130, in call return self.main(*args, kwargs) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main rv = super().main(args, standalone_mode=False, kwargs) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/align.py", line 113, in align_corpus_cli aligner.align() File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 412, in align super().align() File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/base.py", line 345, in align self.align_utterances() File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/mixins.py", line 436, in align_utterances for utterance, log_likelihood in run_kaldi_function( File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/utils.py", line 753, in run_kaldi_function raise v montreal_forced_aligner.exceptions.MultiprocessingError: MultiprocessingError:

Job 2 encountered an error: Traceback (most recent call last):

File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 89, in run yield from self._run()

File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/multiprocessing.py", line 957, in _run self.check_call(align_proc)

File "/Users/cdicanio/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 116, in check_call raise KaldiProcessingError([self.log_path])

montreal_forced_aligner.exceptions.KaldiProcessingError: KaldiProcessingError:

There were 1 job(s) with errors when running Kaldi binaries.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Triqui
    • How many files/speakers? One file for one speaker
    • Are you using lab files or TextGrid files for input? TextGrid files
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? No, my own Triqui dictionary, but it has worked before, even recently (and I'm familiar with the tabular format issue.)
    • If it's a custom dictionary, what is the phoneset? The issue does not have to do with the phoneset. It works without any different settings for beam.
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on? A corpus of 10 hours of transcribed Triqui speech.

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

Additional context Add any other context about the problem here. Prior to aligning, I am still running the following commands:

  1. Delete the MFA directory under Documents (I don't know if I have to do this manually or if the 'clean' option does it, but I was having trouble aligning before when I did not do this.)
  2. conda activate aligner
  3. mfa configure --disable_auto_server
  4. mfa server start

aligned.log

mmcauliffe commented 1 year ago

There is a warning in the logs:

2023-07-27 22:15:41,036 - mfa - WARNING - There were 7 pronunciations in the dictionary that were ignored for containing one of 6 phones not present in the trained acoustic model.  Please run `mfa validate` to get more details.

which might be causing the crash. If this is the same dictionary that was used in training, I'm not sure why it would be complaining about new phones.

If you wouldn't mind emailing me a link to the dictionary, acoustic model, and a file or two that replicate this issue, I can debug further.

cdicanio commented 1 year ago

Hi Michael,

I got the errors to go away once I upgraded to the newest version (I was running 2.2.6). I think this has to do with the errors that I was getting a few months back that we discussed. The new phone error baffled me too. Do you still want to try to replicate? If so, here is a set of files, the triqui dictionary, and the triqui aligner.

https://www.dropbox.com/scl/fi/pzt846l0th6p757wt7xa3/Triqui_aligner_with_files.zip?rlkey=8twr85wvns1qamthmn0g3dy7c&dl=0 https://www.dropbox.com/scl/fi/pzt846l0th6p757wt7xa3/Triqui_aligner_with_files.zip?rlkey=8twr85wvns1qamthmn0g3dy7c&dl=0

(Please allow 48-72 hours for a response. If it is an emergency, please call me.)

Best, Christian DiCanio https://www.acsu.buffalo.edu/~cdicanio/ Associate Professor Director of Graduate Admissions Department of Linguistics University at Buffalo

On Aug 12, 2023, at 2:50 PM, Michael McAuliffe @.***> wrote:

There is a warning in the logs:

2023-07-27 22:15:41,036 - mfa - WARNING - There were 7 pronunciations in the dictionary that were ignored for containing one of 6 phones not present in the trained acoustic model. Please run mfa validate to get more details. which might be causing the crash. If this is the same dictionary that was used in training, I'm not sure why it would be complaining about new phones.

If you wouldn't mind emailing me a link to the dictionary, acoustic model, and a file or two that replicate this issue, I can debug further.

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/671#issuecomment-1676052234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG22LM4BFHSYRJU4EYYHXWDXU7F65ANCNFSM6AAAAAA224F6L4. You are receiving this because you authored the thread.