MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

Open EurFelux opened 3 weeks ago

EurFelux commented 3 weeks ago

Debugging checklist

Describe the issue I encountered an ImportError requiring Chinese tokenization support, which conflicts with numpy and sklearn versions.

I installed MFA via conda, and the version of numpy is 1.26.4.

I tried aligning on a Mandarin corpus, but the terminal prompted that I needed to install dependencies.

ImportError: Please install Chinese tokenization support via pip install spacy-pkuseg dragonmapper hanziconv.

However, spacy-pkuseg requires numpy>=2.0.0. I attempted to run the command provided in the error message, which updated Numpy to 2.0.2. However, if I install Numpy 2.0.2, the dependency installed with MFA is sklearn 1.2.2, and these two packages seem to conflict. I encountered an error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

And I found a solution here: StackOverflow, which suggested downgrading numpy to 1.26.4.

I ultimately found a temporary solution by specifying an older version of spacy-pkuseg:

pip install spacy-pkuseg==0.0.33

This version only need numpy>=1.19.0.

I hope MFA can resolve this dependency issue and update the documentation, as there are no instructions indicating that I need to install these dependencies, but I receive a prompt when executing mfa align ....

For Reproducing your issue

  1. Corpus structure
    • What language is the corpus in? Mandarin
    • How many files/speakers? 1 speaker, 1 audio and 1 text. Just for test.
    • Are you using lab files or TextGrid files for input? No.
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? Yes. mandarin_china_mfa
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? Yes. mandarin_mfa
    • If it's a model you've trained, what data was it trained on?

To reproduce:

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
mfa model download acoustic mandarin_mfa
mfa model download dictionary mandarin_china_mfa
mfa validate CORPUS_DIRECTORY mandarin_china_mfa
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY
pip install spacy-pkuseg dragonmapper hanziconv
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY

Log file sp1.log

Desktop (please complete the following information):

Additional context

chenchenzi commented 1 week ago

I encoutered the same issue too. Thanks for the tip of installing spacy-pkuseg version 0.0.33.