Validating the corpus with "mfa validate" command, but get "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte" #489
[ ] Have you updated to latest MFA version?
yes, version is 2.0.5
[ ] Have you tried rerunning the command with the --clean flag?
yes, the command is "mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean"
Describe the issue
A clear and concise description of what the bug is.
(MFAligner) audio_test@ubuntu:/data/y00580163/PDAugment$ mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f00ab222380>>
Traceback (most recent call last):
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 103, in history_save_handler
raise self.exception
File "/home/audio_test/.conda/envs/MFAligner/bin/mfa", line 11, in
sys.exit(main())
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 1077, in main
run_validate_corpus(args, unknown)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 154, in run_validate_corpus
validate_corpus(args, unknown)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 35, in validate_corpus
validator = PretrainedValidator(
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 1323, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 65, in init
super().init(kw)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 423, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/base.py", line 68, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 1020, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 93, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/base.py", line 98, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 465, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 287, in init
super().init(**kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/dictionary/multispeaker.py", line 148, in init
self.dictionary_model = DictionaryModel(
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/models.py", line 951, in init
for line in f:
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte
For Reproducing your issue
Please fill out the following:
Corpus structure
What language is the corpus in?Chinese pinyin
How many files/speakers? aishell data set has 340 speakers
Are you using lab files or TextGrid files for input? i use lab file
If it's a model you've trained, what data was it trained on? it's not
Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).i only has the command_history.yaml
command_history.zip
Desktop (please complete the following information):
OS: [e.g. Windows, OSX, Linux] windows 10
Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc]
Any other details about the setup (Cloud, Docker, etc)
Additional context
Add any other context about the problem here.
Debugging checklist
[ ] Have you updated to latest MFA version? yes, version is 2.0.5 [ ] Have you tried rerunning the command with the
--clean
flag? yes, the command is "mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean"Describe the issue A clear and concise description of what the bug is. (MFAligner) audio_test@ubuntu:/data/y00580163/PDAugment$ mfa validate data/speech/wav/aishell100/BAC009 data/MFA2/pretrained_models/dictionary/mandarin_china_mfa.dict data/MFA2/pretrained_models/acoustic/mandarin_mfa.zip --clean Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f00ab222380>> Traceback (most recent call last): File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 103, in history_save_handler raise self.exception File "/home/audio_test/.conda/envs/MFAligner/bin/mfa", line 11, in
sys.exit(main())
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 1077, in main
run_validate_corpus(args, unknown)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 154, in run_validate_corpus
validate_corpus(args, unknown)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 35, in validate_corpus
validator = PretrainedValidator(
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 1323, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 65, in init
super().init(kw)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 423, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/base.py", line 68, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 1020, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 93, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/corpus/base.py", line 98, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 465, in init
super().init(kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 287, in init
super().init(**kwargs)
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/dictionary/multispeaker.py", line 148, in init
self.dictionary_model = DictionaryModel(
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/site-packages/montreal_forced_aligner/models.py", line 951, in init
for line in f:
File "/home/audio_test/.conda/envs/MFAligner/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte
For Reproducing your issue Please fill out the following:
Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
).i only has the command_history.yaml command_history.zipDesktop (please complete the following information):
Additional context Add any other context about the problem here.