MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 249 forks source link

[BUG] --g2p_model_path option of align_one command is not working #770

Closed Stephane-Lpt closed 8 months ago

Stephane-Lpt commented 9 months ago

Debugging checklist

[Y] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [Y] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? [Y] Have you tried rerunning the command with the --clean flag?

Describe the issue Using a g2p model with the align_one command makes mfa crash.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 1
    • Are you using lab files or TextGrid files for input? .lab
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? english_mfa (3.0.0)
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_mfa (3.0.0) **4. G2P model
    • If you're using an G2P model, is it one download through MFA? If so, which one? english_mfa (3.0.0)**

Log file

(mfaPR) [sloppine@zained ~]$ mfa align_one --clean **--g2p_model_path ~/Documents/MFA/pretrained_models/g2p/**english_us_mfa.zip ~/Documents/Alignement-force/data/Homemade/test_terraria.wav ~/Documents/Alignement-force/data/Homemade/test_terraria.lab english_mfa english_mfa ~/Documents/Alignement-force/mfa/dataPR/
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7cfbe4108b10>>
Traceback (most recent call last):
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
    raise self.exception
  File "/home/sloppine/.conda/envs/mfaPR/bin/mfa", line 10, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align_one.py", line 184, in align_one_cli
    ctm = align_utterance_online(
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sloppine/.conda/envs/mfaPR/lib/python3.11/site-packages/montreal_forced_aligner/online/alignment.py", line 57, in align_utterance_online
    lexicon_compiler.add_pronunciation(KalpyPronunciation(w, pron[0]))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: __init__() missing 5 required positional arguments: 'probability', 'silence_after_probability', 'silence_before_correction', 'non_silence_before_correction', and 'disambiguation'
(mfaPR) [sloppine@zained ~]$ mfa version
3.0.1

Desktop:

Additional context Without the g2p option or if there are no unknown words in the dictionary that cause g2p to be used, the align_one function works well. You can find the test files (.lab and .wav) right here : https://we.tl/t-90pVdp1MzT

All that being said, I've worked on a fix with @NiziL.

As suggested by the error described in the Log file, the error comes from the Pronunciation class in lexicon.py :

@dataclassy.dataclass
class Pronunciation:
    """
    Data class for storing information about a particular pronunciation
    """

    orthography: str
    pronunciation: str
    probability: typing.Optional[float]
    silence_after_probability: typing.Optional[float]
    silence_before_correction: typing.Optional[float]
    non_silence_before_correction: typing.Optional[float]
    disambiguation: typing.Optional[int]

The Optional class can be used to specify a float or None value. Although attributes are declared as Optional, they have no default values. Consequently, when creating an instance of Pronunciation, you need to put values to these float parameters or at least None. So the best solution is to set the attributes to None by default if no value is specified :

@dataclassy.dataclass
class Pronunciation:
    """
    Data class for storing information about a particular pronunciation
    """

    orthography: str
    pronunciation: str
    probability: typing.Optional[float] = None
    silence_after_probability: typing.Optional[float] = None
    silence_before_correction: typing.Optional[float] = None
    non_silence_before_correction: typing.Optional[float] = None
    disambiguation: typing.Optional[int] = None

Once this first bug has been solved, a second one will emerge: (the --clean option is not in the command below voluntarily, you'll see why just after)

(mfa) [sloppine@zained mfa]$ mfa align_one --g2p_model_path ~/Documents/MFA/pretrained_models/g2p/english_us_mfa.zip ~/Documents/Alignement-force/data/Homemade/test_terraria.wav ~/Documents/Alignement-force/data/Homemade/test_terraria.lab english_mfa english_mfa ~/Documents/Alignement-force/mfa/dataPR/
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7191eb8f3550>>
Traceback (most recent call last):
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
    raise self.exception
  File "/home/sloppine/.conda/envs/mfa/bin/mfa", line 10, in <module>
    sys.exit(mfa_cli())
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/montreal_forced_aligner/command_line/align_one.py", line 187, in align_one_cli
    ctm = align_utterance_online(
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/montreal_forced_aligner/online/alignment.py", line 58, in align_utterance_online
    lexicon_compiler.add_pronunciation(KalpyPronunciation(w, pron[0]))
  File "/home/sloppine/.conda/envs/mfa/lib/python3.10/site-packages/kalpy/fstext/lexicon.py", line 640, in add_pronunciation
    self._fst.add_arc(
  File "extensions/_pywrapfst.pyx", line 2113, in _pywrapfst.MutableFst.add_arc
TypeError: an integer is required

The add_arc call that leads to the exception is:

                        self.non_silence_state, # Because of self.non_silence_state which is None
                        pywrapfst.Arc(
                            arc.ilabel,
                            word_symbol,
                            pywrapfst.Weight(
                                self._fst.weight_type(), pron_cost + non_silence_before_cost
                            ),
                            arc.nextstate + start_index,
                        ),
                    )

The error comes from the same file (lexicon.py). When self._fst is loaded trough load_l_from_file, the silence_state and non_silence_state are left uninitialized, which lead to this error when adding a new pronunciation with the g2p model :

def load_l_from_file(
        self,
        l_fst_path: typing.Union[pathlib.Path, str],
    ) -> None:
        """
        Read g.fst from file

        Parameters
        ----------
        l_fst_path: :class:`~pathlib.Path` or str
            Path to read HCLG.fst
        """
        self._fst = pynini.Fst.read(str(l_fst_path))

This is not the case when the self._fst is completely created from scratch. (See below):

def create_fsts(self, phonological_rule_fst: pynini.Fst = None):
        if self._fst is not None and self._align_fst is not None:
            return

        initial_silence_cost = 0
        initial_non_silence_cost = 0
        if self.initial_silence_probability:
            initial_silence_cost = -1 * math.log(self.initial_silence_probability)
            initial_non_silence_cost = -1 * math.log(1.0 - self.initial_silence_probability)

        final_silence_cost = 0
        final_non_silence_cost = 0
        if self.final_silence_correction:
            final_silence_cost = -math.log(self.final_silence_correction)
            final_non_silence_cost = -math.log(self.final_non_silence_correction)

        self.phone_table.find(self.silence_disambiguation_symbol)
        phone_eps_symbol = self.phone_table.find("<eps>")
        self.word_table.find(self.silence_word)
        self._fst = pynini.Fst()
        self._align_fst = pynini.Fst()
        self.start_state = self._fst.add_state()
        self._align_fst.add_state()
        self._fst.set_start(self.start_state)
        self.non_silence_state = self._fst.add_state()  # INITIALIZED HERE
        self._align_fst.add_state()
        self.silence_state = self._fst.add_state() # INITIALIZED HERE
        self._align_fst.add_state()

Therefore, silence_state and non_silence_state must also be initialized during loading as follows:

def load_l_from_file(
        self,
        l_fst_path: typing.Union[pathlib.Path, str],
    ) -> None:
        """
        Read g.fst from file

        Parameters
        ----------
        l_fst_path: :class:`~pathlib.Path` or str
            Path to read HCLG.fst
        """
        self._fst = pynini.Fst.read(str(l_fst_path))
        self.non_silence_state = 1 # INITIALIZED NOW
        self.silence_state = 2 # INITIALIZED NOW

We've set 1 for non_silence_state and 2 for silence_state to match the numbers assigned during creation with the add_state method. add_state simply returns a new state ID depending on the last state id given. This new id is just initialized to (last state id + 1)

To come back to the fact that we haven't set the --clean option. In fact, if we add the --clean option after fixing the first bug. It will work. I guess it's because --clean makes L.fst be deleted such as temporary files as you said in your documentation, but we'll lose performance.

Finally, we know that this fix for the kalpy library is not montreal itself, but it was impossible to find the kalpy repository online and, since you are the creator and the maintainer of kalpy on pypi and conda-forge, we took the liberty of posting the issue and the fix here.