[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
[x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version?
[x] Have you tried rerunning the command with the --clean flag?

Describe the issue I encountered an ImportError requiring Chinese tokenization support, which conflicts with numpy and sklearn versions.

I installed MFA via conda, and the version of numpy is 1.26.4.

I tried aligning on a Mandarin corpus, but the terminal prompted that I needed to install dependencies.

ImportError: Please install Chinese tokenization support via pip install spacy-pkuseg dragonmapper hanziconv.

However, spacy-pkuseg requires numpy>=2.0.0. I attempted to run the command provided in the error message, which updated Numpy to 2.0.2. However, if I install Numpy 2.0.2, the dependency installed with MFA is sklearn 1.2.2, and these two packages seem to conflict. I encountered an error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

And I found a solution here: StackOverflow, which suggested downgrading numpy to 1.26.4.

I ultimately found a temporary solution by specifying an older version of spacy-pkuseg:

pip install spacy-pkuseg==0.0.33

This version only need numpy>=1.19.0.

I hope MFA can resolve this dependency issue and update the documentation, as there are no instructions indicating that I need to install these dependencies, but I receive a prompt when executing mfa align ....

For Reproducing your issue

Corpus structure
- What language is the corpus in? Mandarin
- How many files/speakers? 1 speaker, 1 audio and 1 text. Just for test.
- Are you using lab files or TextGrid files for input? No.
Dictionary
- Are you using a dictionary from MFA? If so, which one? Yes. mandarin_china_mfa
- If it's a custom dictionary, what is the phoneset?
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? Yes. mandarin_mfa
- If it's a model you've trained, what data was it trained on?

To reproduce:

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
mfa model download acoustic mandarin_mfa
mfa model download dictionary mandarin_china_mfa
mfa validate CORPUS_DIRECTORY mandarin_china_mfa
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY
pip install spacy-pkuseg dragonmapper hanziconv
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY

Log file sp1.log

Desktop (please complete the following information):

OS: Linux
Version: Ubuntu 20.04.6 LTS

Additional context

MontrealCorpusTools / Montreal-Forced-Aligner

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843