Validating the corpus with "mfa validate" command, but get "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte"

Debugging checklist

[ ] Have you updated to latest MFA version? yes, version is 2.0.5 [ ] Have you tried rerunning the command with the --clean flag? yes, the command is "mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean"

Describe the issue A clear and concise description of what the bug is. (MFAligner) audio_test@ubuntu:/data/y00580163/PDAugment$ mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f00ab222380>> Traceback (most recent call last): File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 103, in history_save_handler raise self.exception File "/home/audio_test/.conda/envs/MFAligner/bin/mfa", line 11, in sys.exit(main()) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 1077, in main run_validate_corpus(args, unknown) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 154, in run_validate_corpus validate_corpus(args, unknown) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 35, in validate_corpus validator = PretrainedValidator( File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 1323, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 65, in init super().init(kw) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 423, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/base.py", line 68, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 1020, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 93, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/base.py", line 98, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 465, in init super().init(kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 287, in init super().init(**kwargs) File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/dictionary/multispeaker.py", line 148, in init self.dictionary_model = DictionaryModel( File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/models.py", line 951, in init for line in f: File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte

For Reproducing your issue Please fill out the following:

Corpus structure
- What language is the corpus in?Chinese pinyin
- How many files/speakers? aishell data set has 340 speakers
- Are you using lab files or TextGrid files for input? i use lab file
Dictionary
- Are you using a dictionary from MFA? If so, which one? yes and is from: https://mfa-models.readthedocs.io/en/latest/dictionary/Mandarin/Mandarin%20%28China%29%20MFA%20dictionary%20v2_0_0.html#Mandarin%20(China)%20MFA%20dictionary%20v2_0_0
- If it's a custom dictionary, what is the phoneset? it' not.
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one?yes and is from:https://mfa-models.readthedocs.io/en/latest/acoustic/Mandarin/Mandarin%20MFA%20acoustic%20model%20v2_0_0.html#Mandarin%20MFA%20acoustic%20model%20v2_0_0
- If it's a model you've trained, what data was it trained on? it's not

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).i only has the command_history.yaml command_history.zip

Desktop (please complete the following information):

OS: [e.g. Windows, OSX, Linux] windows 10
Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc]
Any other details about the setup (Cloud, Docker, etc)

Additional context Add any other context about the problem here.

MontrealCorpusTools / Montreal-Forced-Aligner

Validating the corpus with "mfa validate" command, but get "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte" #489