Closed leandro-gracia-gil closed 5 months ago
Note: this example is for Japanese, but I expect to do the same (feeding phonemes as input) in a few other latin script languages. I haven't checked yet if these are also affected by the same issue.
Also, one thing I had to fix while debugging. I can open a separate bug if needed.
In file tokenization/japanese.py, line 19:
config_path = resource_dir.joinpath("japanese", "sudachi_config.json")
This fails later because config_path is a pathlib object, which is not supported by sudachipy. It can be easily fixed by forcing a conversion to string.
config_path = str(resource_dir.joinpath("japanese", "sudachi_config.json"))
You can download the old 2.0 Japanese model via mfa download acoustic japanese_mfa --version 2.0.1a --force
(see https://mfa-models.readthedocs.io/en/latest/acoustic/Japanese/Japanese%20MFA%20acoustic%20model%20v2_0_1a.html). The 3.0 Japanese model uses sudachipy's tokenization for input text and assumes it's normal Japanese kana/kanji/romaji, which is why i
is getting mapped to アイ
and IPA specific symbols are ignored.
I see, thanks. Regardless from the tokenization issue, is there any other new feature, or improvement in quality I would be missing by using the old 2.0.1a model instead of the 3.0.0 one?
Also, since the 3.0.0 model uses text + tokenization, is it trying to align with all possible pronunciations (as in different phonemes with different probabilities for a same word in a dict) and picking the best match, or rather using some criteria to pick the most likely pronunciation first and then attempt to align with it?
I'm closing this bug as the previous 2.0.1a model can still be used in this way. Thanks for your help.
I have been using
mfa align
to generate the alignments of audio with input IPA phonemes directly instead of text. This was done by using a handmade dictionary that simply maps IPA phonemes to themselves. The reason for this is that my use case forces me to do G2P separately in my own way, although ensuring that the produced phonemes are supported by the MFA acoustic model.However, after updating from version 2.x to 3.x (in particular, 3.0.7), I'm seeing that
mfa align
now attempts to do a text tokenization step that is modifying my input IPA phonemes and affecting the alignment results.Here's an example with Japanese text (好きにする):
(I got these tokenizer results by checking tokenization/japanese.py in the installed MFA package code while debugging the issue)
Is there any way to bypass the tokenizer and align using my input phonemes directly?
Log file No log files were generated, since the problem does not manifest as a runtime error.