MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 249 forks source link

[BUG] align_one language error with tokenizer #802

Open Hocine958 opened 7 months ago

Hocine958 commented 7 months ago

Debugging checklist

[ ] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [X] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? [ ] Have you tried rerunning the command with the --clean flag?

Describe the issue When performing an "align_one" command on japanese files, the "language" passed to to "generate_language_tokenizer()" function (align_one.py line 156) is a string instead of enum, which causes the if in spacy.py at line 56 to be skiped and dict access at line 66 to throw a "KeyError: 'japanese'" exception.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? -> japanese
    • How many files/speakers? -> 1
    • Are you using lab files or TextGrid files for input? -> txt file
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a model you've trained, what data was it trained on?

Log file

(env) mfauser@46587cd4c6e4:/$ mfa align_one data/japanese/japanese.wav data/japanese/japanese.txt japanese_mfa japanese_mfa data/jap_one_err
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7fd96f485390>>
Traceback (most recent call last):
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
    raise self.exception
  File "/env/bin/mfa", line 8, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 360, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align_one.py", line 156, in align_one_cli
    tokenizer = generate_language_tokenizer(acoustic_model.meta["language"])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/tokenization/spacy.py", line 66, in generate_language_tokenizer
    name = language_model_mapping[language]
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'japanese'

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

uasolo commented 4 months ago

I have the same problem (with Mandarin)