Closed Aranxtonel closed 5 months ago
I have not used russian_mfa
, but I found that Russian MFA Dictionary contained only these phones:
a b bʲ bʲː bː c cː dzʲː dʐː dʲ dʲː d̪ d̪z̪ d̪z̪ː d̪ː e f fʲ fʲː fː i j jː k kː m mʲ mʲː mː n̪ n̪ː o p pʲ pʲː pː r rʲ rʲː rː sʲ sʲː s̪ s̪ː tsʲ tɕ tɕː tʂ tʂː tʲ tʲː t̪ t̪s̪ t̪s̪ː t̪ː u v vʲ vʲː vː x xː zʲ zʲː z̪ z̪ː æ ç ɐ ɕ ɕː ə ɛ ɟ ɟː ɡ ɡː ɣ ɨ ɪ ɫ ɫː ɲ ɲː ɵ ʂ ʂː ʉ ʊ ʎ ʎː ʐ ʐː ʑː
It seems that the phones mentioned in your exception information d, dz, s, t, ts, and z
in their raw forms (without diacritics) are indeed absent in the acoustic model. If so, try replace these phones in your dictionary?
I have not used
russian_mfa
, but I found that Russian MFA Dictionary contained only these phones:a b bʲ bʲː bː c cː dzʲː dʐː dʲ dʲː d̪ d̪z̪ d̪z̪ː d̪ː e f fʲ fʲː fː i j jː k kː m mʲ mʲː mː n̪ n̪ː o p pʲ pʲː pː r rʲ rʲː rː sʲ sʲː s̪ s̪ː tsʲ tɕ tɕː tʂ tʂː tʲ tʲː t̪ t̪s̪ t̪s̪ː t̪ː u v vʲ vʲː vː x xː zʲ zʲː z̪ z̪ː æ ç ɐ ɕ ɕː ə ɛ ɟ ɟː ɡ ɡː ɣ ɨ ɪ ɫ ɫː ɲ ɲː ɵ ʂ ʂː ʉ ʊ ʎ ʎː ʐ ʐː ʑː
It seems that the phones mentioned in your exception information
d, dz, s, t, ts, and z
in their raw forms (without diacritics) are indeed absent in the acoustic model. If so, try replace these phones in your dictionary?
To be clear, I use the russian_mfa dictionary without any additional modifications. I don't think it even has any words with these phones without diacritics, but even removing them completely from the dictionary didn't help.
My fault, I didn't notice that you use the command mfa align ./input russian_mfa russian_mfa ./output
.
I've downloaded both russian_mfa
dictionary and acoustic model, and tried this on my PC. I'm sorry. I CANNOT reproduce your error. Everything goes well on my PC.
And I've checked the russian_mfa
dictionary, there are no d, dz, s, t, ts, and z in their raw forms (without diacritics) present. So, I guess there might be something wrong with your dictionary file.
Therefore, I would suggest:
mfa model download dictionary russian_mfa --ignore_cache
to force redownload the dictionary file and then retry your command.~/Documents/MFA/pretrained_models/dictionary/russian_mfa.dict
. Please check are there those symbols in this file.I'm sorry. I CANNOT reproduce your error. Everything goes well on my PC.
So, here is the minimal reproducible example. "docker build ." command with this Dockerfile fails during the "mfa align" step. "--ignore_cache" and "--clean" don't help. Docker is quite consistent and I hope you will be able to reproduce this.
Dockerfile:
FROM python:3.12-slim
RUN apt-get update && apt-get install -y wget
ENV PATH="/usr/local/miniconda3/bin:${PATH}"
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda3 \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda config --append channels conda-forge
RUN conda create -n aligner -c conda-forge montreal-forced-aligner
RUN bash -c "source activate aligner; mfa model download dictionary russian_mfa --ignore_cache"
RUN bash -c "source activate aligner; mfa model download acoustic russian_mfa --ignore_cache"
RUN mkdir "input"
RUN wget -P ./input https://ruslan-corpus.github.io/audio/01.wav
RUN echo "Hello World" > ./input/01.txt
RUN bash -c 'source activate aligner; mfa align ./input russian_mfa russian_mfa ./output --clean'
I've uploaded new versions of the russian_mfa dictionary and model, so redownloading them via:
mfa model download dictionary russian_mfa --ignore_cache
mfa model download acoustic russian_mfa --ignore_cache
Should work with the current versions (and perform better since I've fixed up a number of issues in the source corpora).
I can confirm, now it works. Thank you!
Debugging checklist
[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? Yes [x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of
mfa version
? 3.1.1 [x] Have you tried rerunning the command with the--clean
flag? YesDescribe the issue Using the command "mfa align ./input russian_mfa russian_mfa ./output" seems to consistently result in the following error during or after the "Creating corpus split" step is executed:
I've tried using "russian_mfa" and also manually downloading them from Github. Both russian_mfa v2_0_0 and russian_mfa v2_0_0a cause this error. "russian_cv" completes successfully.
For Reproducing your issue Please fill out the following:
Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
).Desktop (please complete the following information):
Additional context Add any other context about the problem here.