MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG] mfa align doesn't work with russian_mfa at all #818

Closed Aranxtonel closed 5 months ago

Aranxtonel commented 5 months ago

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? Yes [x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? 3.1.1 [x] Have you tried rerunning the command with the --clean flag? Yes

Describe the issue Using the command "mfa align ./input russian_mfa russian_mfa ./output" seems to consistently result in the following error during or after the "Creating corpus split" step is executed:

montreal_forced_aligner.exceptions.PronunciationAcousticMismatchError: PronunciationAcousticMismatchError:
There were phones in the dictionary that do not have acoustic models: 
d, dz, s, t, ts, and z

I've tried using "russian_mfa" and also manually downloading them from Github. Both russian_mfa v2_0_0 and russian_mfa v2_0_0a cause this error. "russian_cv" completes successfully.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Russian
    • How many files/speakers? I tested with different numbers. It doesn't seem to matter at all. Even using russian_mfa with English audio and text has the same result.
    • Are you using lab files or TextGrid files for input? .txt
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? russian_mfa
    • If it's a custom dictionary, what is the phoneset? -
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? russian_mfa
    • If it's a model you've trained, what data was it trained on? -

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

INFO Setting up corpus information...
INFO Loading corpus from source files...
1% 1/100 [ 0:00:01 < -:--:-- , ? it/s ] INFO Found 1 speaker across 1 file, average number of utterances per
speaker: 1.0
INFO Initializing multiprocessing jobs...
WARNING Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs. Use the --single_speaker flag if you would
like to split utterances across jobs regardless of their speaker.
INFO Normalizing text...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 [ 0:00:01 < 0:00:00 , ? it/s ] INFO Generating MFCCs...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 [ 0:00:14 < 0:00:00 , ? it/s ] INFO Calculating CMVN...
INFO Generating final features...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 [ 0:00:01 < 0:00:00 , ? it/s ] INFO Creating corpus split...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 [ 0:00:01 < 0:00:00 , ? it/s ] ERROR There was an error in the run, please see the log.
Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/local/miniconda3/envs/aligner/bin/mfa", line 10, in sys.exit(mfa_cli()) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/rich_click/rich_command.py", line 367, in call return super().call(*args, kwargs) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/rich_click/rich_command.py", line 152, in main rv = self.invoke(ctx) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/command_line/align.py", line 122, in align_corpus_cli aligner.align() File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 333, in align self.setup() File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 214, in setup self.acoustic_model.validate(self) File "/usr/local/miniconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/models.py", line 811, in validate raise (PronunciationAcousticMismatchError(missing_phones)) montreal_forced_aligner.exceptions.PronunciationAcousticMismatchError: PronunciationAcousticMismatchError:

There were phones in the dictionary that do not have acoustic models: d, dz, s, t, ts, and z

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

fncokg commented 5 months ago

I have not used russian_mfa, but I found that Russian MFA Dictionary contained only these phones:

a b bʲ bʲː bː c cː dzʲː dʐː dʲ dʲː d̪ d̪z̪ d̪z̪ː d̪ː e f fʲ fʲː fː i j jː k kː m mʲ mʲː mː n̪ n̪ː o p pʲ pʲː pː r rʲ rʲː rː sʲ sʲː s̪ s̪ː tsʲ tɕ tɕː tʂ tʂː tʲ tʲː t̪ t̪s̪ t̪s̪ː t̪ː u v vʲ vʲː vː x xː zʲ zʲː z̪ z̪ː æ ç ɐ ɕ ɕː ə ɛ ɟ ɟː ɡ ɡː ɣ ɨ ɪ ɫ ɫː ɲ ɲː ɵ ʂ ʂː ʉ ʊ ʎ ʎː ʐ ʐː ʑː

It seems that the phones mentioned in your exception information d, dz, s, t, ts, and z in their raw forms (without diacritics) are indeed absent in the acoustic model. If so, try replace these phones in your dictionary?

Aranxtonel commented 5 months ago

I have not used russian_mfa, but I found that Russian MFA Dictionary contained only these phones:

a b bʲ bʲː bː c cː dzʲː dʐː dʲ dʲː d̪ d̪z̪ d̪z̪ː d̪ː e f fʲ fʲː fː i j jː k kː m mʲ mʲː mː n̪ n̪ː o p pʲ pʲː pː r rʲ rʲː rː sʲ sʲː s̪ s̪ː tsʲ tɕ tɕː tʂ tʂː tʲ tʲː t̪ t̪s̪ t̪s̪ː t̪ː u v vʲ vʲː vː x xː zʲ zʲː z̪ z̪ː æ ç ɐ ɕ ɕː ə ɛ ɟ ɟː ɡ ɡː ɣ ɨ ɪ ɫ ɫː ɲ ɲː ɵ ʂ ʂː ʉ ʊ ʎ ʎː ʐ ʐː ʑː

It seems that the phones mentioned in your exception information d, dz, s, t, ts, and z in their raw forms (without diacritics) are indeed absent in the acoustic model. If so, try replace these phones in your dictionary?

To be clear, I use the russian_mfa dictionary without any additional modifications. I don't think it even has any words with these phones without diacritics, but even removing them completely from the dictionary didn't help.

fncokg commented 5 months ago

My fault, I didn't notice that you use the command mfa align ./input russian_mfa russian_mfa ./output.

I've downloaded both russian_mfa dictionary and acoustic model, and tried this on my PC. I'm sorry. I CANNOT reproduce your error. Everything goes well on my PC.

And I've checked the russian_mfa dictionary, there are no d, dz, s, t, ts, and z in their raw forms (without diacritics) present. So, I guess there might be something wrong with your dictionary file.

Therefore, I would suggest:

Aranxtonel commented 5 months ago

I'm sorry. I CANNOT reproduce your error. Everything goes well on my PC.

So, here is the minimal reproducible example. "docker build ." command with this Dockerfile fails during the "mfa align" step. "--ignore_cache" and "--clean" don't help. Docker is quite consistent and I hope you will be able to reproduce this.

Dockerfile:

FROM python:3.12-slim
RUN apt-get update && apt-get install -y wget
ENV PATH="/usr/local/miniconda3/bin:${PATH}"
RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda3 \
    && rm -f Miniconda3-latest-Linux-x86_64.sh 
RUN conda config --append channels conda-forge
RUN conda create -n aligner -c conda-forge montreal-forced-aligner
RUN bash -c "source activate aligner; mfa model download dictionary russian_mfa --ignore_cache"
RUN bash -c "source activate aligner; mfa model download acoustic russian_mfa --ignore_cache"
RUN mkdir "input"
RUN wget -P ./input https://ruslan-corpus.github.io/audio/01.wav
RUN echo "Hello World" > ./input/01.txt
RUN bash -c 'source activate aligner; mfa align ./input russian_mfa russian_mfa ./output --clean'
mmcauliffe commented 5 months ago

I've uploaded new versions of the russian_mfa dictionary and model, so redownloading them via:

mfa model download dictionary russian_mfa --ignore_cache
mfa model download acoustic russian_mfa --ignore_cache

Should work with the current versions (and perform better since I've fixed up a number of issues in the source corpora).

Aranxtonel commented 5 months ago

I can confirm, now it works. Thank you!