MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
MIT License
1.26k stars 242 forks source link

[BUG] align_one language error with tokenizer #802

Open Hocine958 opened 2 months ago

Hocine958 commented 2 months ago

Debugging checklist

[ ] Have you read the troubleshooting page ( and searched the documentation to ensure that your issue is not addressed there? [X] Have you updated to latest MFA version (check What is the output of mfa version? [ ] Have you tried rerunning the command with the --clean flag?

Describe the issue When performing an "align_one" command on japanese files, the "language" passed to to "generate_language_tokenizer()" function ( line 156) is a string instead of enum, which causes the if in at line 56 to be skiped and dict access at line 66 to throw a "KeyError: 'japanese'" exception.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? -> japanese
    • How many files/speakers? -> 1
    • Are you using lab files or TextGrid files for input? -> txt file
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a model you've trained, what data was it trained on?

Log file

(env) mfauser@46587cd4c6e4:/$ mfa align_one data/japanese/japanese.wav data/japanese/japanese.txt japanese_mfa japanese_mfa data/jap_one_err
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7fd96f485390>>
Traceback (most recent call last):
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/", line 107, in history_save_handler
    raise self.exception
  File "/env/bin/mfa", line 8, in <module>
  File "/env/lib/python3.11/site-packages/rich_click/", line 360, in __call__
    return super().__call__(*args, **kwargs)
  File "/env/lib/python3.11/site-packages/click/", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/env/lib/python3.11/site-packages/rich_click/", line 152, in main
    rv = self.invoke(ctx)
  File "/env/lib/python3.11/site-packages/click/", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/env/lib/python3.11/site-packages/click/", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/env/lib/python3.11/site-packages/click/", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/env/lib/python3.11/site-packages/click/", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/", line 156, in align_one_cli
    tokenizer = generate_language_tokenizer(acoustic_model.meta["language"])
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/tokenization/", line 66, in generate_language_tokenizer
    name = language_model_mapping[language]
KeyError: 'japanese'

Desktop (please complete the following information):

Additional context Add any other context about the problem here.