MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

BrokenPipeError: [Errno 32] Broken pipe #757

Closed nlp12 closed 4 months ago

nlp12 commented 4 months ago

Debugging checklist

[y ] Have you updated to latest MFA version? [y ] Have you tried rerunning the command with the --clean flag?

Describe the issue Tried to run : mfa validate my_corpus mandarin_mfa mandarin_mfa --clean but we got the Error number 32 Broken pipe. We are using the mfa mandarin dictionary and acoustic model, and we are in a virtual environment called aligner. We ran model mfa inspect on both the models (acoustic and dictionary) and they were fine and worked.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Mandarin
    • How many files/speakers? 9 files, one speaker but it recognizes it as 2 or 3 speakers when its only just one speaker.
    • Are you using lab files or TextGrid files for input? TextGrid.
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? Mandarin mfa
    • If it's a custom dictionary, what is the phoneset? N/A
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? Mandarin mfa
    • If it's a model you've trained, what data was it trained on? N/A

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). CMVN.log: 2024-02-22 16:27:21,726 - kalpy.cmvn - INFO - Reading features from: /Users/Documents/MFA/mfa_1/mfa_1/feats.scp 2024-02-22 16:27:21,726 - kalpy.cmvn - INFO - Reading spk2utt from: /Users/Documents/MFA/mfa_1/mfa_1/spk2utt.scp 2024-02-22 16:27:21,727 - kalpy.cmvn - DEBUG - Writing to: ark,scp:/Users/bcd/Documents/MFA/mfa_1/mfa_1/cmvn.ark,/Users/ generate_final_features.1.log generate_final_features.2.log make_mfcc.1.log make_mfcc.2.log normalize_oov.log Documents/MFA/mfa_1/mfa_1/cmvn.scp 2024-02-22 16:27:21,727 - kalpy.cmvn - INFO - Processing speaker 1: 8 utterances 2024-02-22 16:27:21,729 - kalpy.cmvn - INFO - Processing speaker 2: 1 utterances

Desktop (please complete the following information):

Additional context

ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x17726d750>> Traceback (most recent call last): File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler raise self.exception File "/Users/miniconda3/envs/aligner/bin/mfa", line 10, in sys.exit(mfa_cli()) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, **kwargs) File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/command_line/validate.py", line 112, in validate_corpus_cli validator.validate() File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 512, in validate self.train() File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 559, in train self.finalize_training() File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 628, in finalize_training self.train_phone_lm() File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/transcription/transcriber.py", line 317, in train_phone_lm raise v File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/utils.py", line 540, in run self.function.run() File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 106, in run raise MultiprocessingError(self.job_name, error_text) montreal_forced_aligner.exceptions.MultiprocessingError: MultiprocessingError:

Job 1 encountered an error: Traceback (most recent call last):

File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/abc.py", line 102, in run self._run()

File "/Users/miniconda3/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/language_modeling/multiprocessing.py", line 239, in _run farcompile_proc.stdin.flush()

BrokenPipeError: [Errno 32] Broken pipe

nlp12 commented 4 months ago

Is anyone working on tickets for MFA anymore? In need of assistance here, thanks!

mmcauliffe commented 4 months ago

Which version of MFA is this and what's the output of conda list? Can you try running mfa validate my_corpus mandarin_mfa --acoustic_model_path mandarin_mfa --clean instead? See https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/data_validation.html#cmdoption-mfa-validate-acoustic_model_path, also look at https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html for how to format your corpus so that it recognizes the correct speakers.

nlp12 commented 4 months ago

Just ran what you gave me (which is what I have run before with no success) and got a vague error 'error in the run, see the log'. See log below. I am running montreal-forced-a~ conda-forge/noarch::montreal-forced-aligner-3.0.0rc1-pyhd8ed1ab_0 , and sorry I don't know what's the condo list. I inspected acoustic and inspected dictionary, everything is good there. Corpus is formatted based on those links you just sent. I have the original orthography in a textgrid which matches the wav file speech. Not sure what else I need to do here to get this thing working, please help. There is a 2total OOV tokens, but when I check the utterance_oovs.txt it shows the word with a comma in between for some reason (no comma in the textgrids, just spaces for every utterance) then I go to the dictionary to see that word in the dictionary its there, its at the very end of the dictionary, but its there. Please help! Its also recognizing two speakers for some reason, I only have one, but I doubt that is causing this error.I have no coding experience, just trying to get this aligner working on my machine! Thanks!

2024-02-24 16:59:59,223 - kalpy.align - 12014678016 - 1 - DEBUG - Align options: {'transition_scale': 1.0, 'acoustic_scale': 0.083333, 'self_loop_scale': 0.1, 'beam': 10, 'retry_beam': 40, 'boost_silence': 1.0} 2024-02-24 16:59:59,612 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning for dictionary mandarin_mfa (1) 2024-02-24 16:59:59,612 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning with model: /Users/Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:59,612 - kalpy.align - 12014678016 - 1 - DEBUG - Training graph archive: /Users/Documents/MFA/mfa_1/alignment/fsts.1.1.ark 2024-02-24 16:59:59,613 - kalpy.align - 12014678016 - 1 - DEBUG - Feature Archive information: 2024-02-24 16:59:59,613 - kalpy.align - 12014678016 - 1 - DEBUG - CMVN: None 2024-02-24 16:59:59,614 - kalpy.align - 12014678016 - 1 - DEBUG - Deltas: False 2024-02-24 16:59:59,614 - kalpy.align - 12014678016 - 1 - DEBUG - Splices: True 2024-02-24 16:59:59,614 - kalpy.align - 12014678016 - 1 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:59,614 - kalpy.align - 12014678016 - 1 - DEBUG - fMLLR: scp,s,cs:/Users//Documents/MFA/mfa_1/mfa_1/split2/trans.1.1.scp 2024-02-24 16:59:59,615 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning with /Users//Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:59,948 - kalpy.align - 12014678016 - 1 - INFO - Overall log-likelihood per frame is -35.08096051179731 over 15067 frames. 2024-02-24 16:59:59,948 - kalpy.align - 12014678016 - 1 - INFO - Done 8, errors on 0

2024-02-24 16:59:55,283 - kalpy.align - 12014678016 - 1 - DEBUG - Align options: {'transition_scale': 1.0, 'acoustic_scale': 0.083333, 'self_loop_scale': 0.1, 'beam': 10, 'retry_beam': 40, 'boost_silence': 1.0} 2024-02-24 16:59:55,711 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning for dictionary mandarin_mfa (1) 2024-02-24 16:59:55,711 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning with model: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:55,711 - kalpy.align - 12014678016 - 1 - DEBUG - Training graph archive: /Users//Documents/MFA/mfa_1/alignment/fsts.1.1.ark 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - Feature Archive information: 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - CMVN: None 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - Deltas: False 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - Splices: True 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:55,712 - kalpy.align - 12014678016 - 1 - DEBUG - fMLLR: None 2024-02-24 16:59:55,715 - kalpy.align - 12014678016 - 1 - DEBUG - Aligning with /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:56,076 - kalpy.align - 12014678016 - 1 - INFO - Overall log-likelihood per frame is -39.2577215002157 over 15067 frames. 2024-02-24 16:59:56,076 - kalpy.align - 12014678016 - 1 - INFO - Done 8, errors on 0

  1. 2024-02-24 16:59:59,223 - kalpy.align - 12031504384 - 2 - DEBUG - Align options: {'transition_scale': 1.0, 'acoustic_scale': 0.083333, 'self_loop_scale': 0.1, 'beam': 10, 'retry_beam': 40, 'boost_silence': 1.0} 2024-02-24 16:59:59,612 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning for dictionary mandarin_mfa (1) 2024-02-24 16:59:59,612 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning with model: /Users//Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:59,612 - kalpy.align - 12031504384 - 2 - DEBUG - Training graph archive: /Users//Documents/MFA/mfa_1/alignment/fsts.1.2.ark 2024-02-24 16:59:59,613 - kalpy.align - 12031504384 - 2 - DEBUG - Feature Archive information: 2024-02-24 16:59:59,613 - kalpy.align - 12031504384 - 2 - DEBUG - CMVN: None 2024-02-24 16:59:59,613 - kalpy.align - 12031504384 - 2 - DEBUG - Deltas: False 2024-02-24 16:59:59,614 - kalpy.align - 12031504384 - 2 - DEBUG - Splices: True 2024-02-24 16:59:59,614 - kalpy.align - 12031504384 - 2 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:59,614 - kalpy.align - 12031504384 - 2 - DEBUG - fMLLR: scp,s,cs:/Users/bcd/Documents/MFA/mfa_1/mfa_1/split2/trans.1.2.scp 2024-02-24 16:59:59,614 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning with /Users//Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:59,674 - kalpy.align - 12031504384 - 2 - INFO - Overall log-likelihood per frame is -37.196578454495004 over 1802 frames. 2024-02-24 16:59:59,674 - kalpy.align - 12031504384 - 2 - INFO - Done 1, errors on 0

  2. 2024-02-24 16:59:55,283 - kalpy.align - 12031504384 - 2 - DEBUG - Align options: {'transition_scale': 1.0, 'acoustic_scale': 0.083333, 'self_loop_scale': 0.1, 'beam': 10, 'retry_beam': 40, 'boost_silence': 1.0} 2024-02-24 16:59:55,711 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning for dictionary mandarin_mfa (1) 2024-02-24 16:59:55,711 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning with model: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:55,711 - kalpy.align - 12031504384 - 2 - DEBUG - Training graph archive: /Users//Documents/MFA/mfa_1/alignment/fsts.1.2.ark 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - Feature Archive information: 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - CMVN: None 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - Deltas: False 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - Splices: True 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:55,712 - kalpy.align - 12031504384 - 2 - DEBUG - fMLLR: None 2024-02-24 16:59:55,715 - kalpy.align - 12031504384 - 2 - DEBUG - Aligning with /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:55,776 - kalpy.align - 12031504384 - 2 - INFO - Overall log-likelihood per frame is -39.015403891509436 over 1802 frames. 2024-02-24 16:59:55,776 - kalpy.align - 12031504384 - 2 - INFO - Done 1, errors on 0

  3. 2024-02-24 16:59:57,124 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Computing transforms from scratch 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Feature Archive information: 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - CMVN: None 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Deltas: False 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Splices: True 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - fMLLR: None 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Model information: 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Align model path: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:57,125 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Model path: /Users//Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:57,920 - kalpy.fmllr - 12031504384 - 2 - DEBUG - Alignment path: /Users//Documents/MFA/mfa_1/alignment/ali.1.2.ark 2024-02-24 16:59:57,924 - kalpy.fmllr - 12031504384 - 2 - INFO - Processing speaker 2... 2024-02-24 16:59:58,017 - kalpy.fmllr - 12031504384 - 2 - DEBUG - For speaker 2, auxf-impr from fMLLR is 13.97307353117028, over 784.0 frames. 2024-02-24 16:59:58,018 - kalpy.fmllr - 12031504384 - 2 - INFO - Done 1 utterances. 2024-02-24 16:59:58,018 - kalpy.fmllr - 12031504384 - 2 - INFO - Skipped 0 utterances. 2024-02-24 16:59:58,018 - kalpy.fmllr - 12031504384 - 2 - INFO - Overall fMLLR auxf impr per frame is 13.97307353117028 over 784.0 frames.

  4. 2024-02-24 17:00:01,074 - kalpy.align - 12031504384 - 2 - DEBUG - Processed 3 2024-02-24 17:00:01,075 - kalpy.align - 12031504384 - 2 - DEBUG - Finished ali second pass 2024-02-24 17:00:01,079 - kalpy.align - 12031504384 - 2 - DEBUG - Finished ali first pass 2024-02-24 17:00:01,080 - kalpy.align - 12031504384 - 2 - DEBUG - Finished extraction

  5. 2024-02-24 16:59:57,124 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Computing transforms from scratch 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Feature Archive information: 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - CMVN: None 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Deltas: False 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Splices: True 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - LDA: /Users//Documents/MFA/mfa_1/alignment/lda.mat 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - fMLLR: None 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Model information: 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Align model path: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:57,125 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Model path: /Users//Documents/MFA/mfa_1/alignment/final.mdl 2024-02-24 16:59:57,920 - kalpy.fmllr - 12014678016 - 1 - DEBUG - Alignment path: /Users//Documents/MFA/mfa_1/alignment/ali.1.1.ark 2024-02-24 16:59:57,924 - kalpy.fmllr - 12014678016 - 1 - INFO - Processing speaker 1... 2024-02-24 16:59:58,193 - kalpy.fmllr - 12014678016 - 1 - DEBUG - For speaker 1, auxf-impr from fMLLR is 11.319879726637765, over 6228.0 frames. 2024-02-24 16:59:58,193 - kalpy.fmllr - 12014678016 - 1 - INFO - Done 8 utterances. 2024-02-24 16:59:58,193 - kalpy.fmllr - 12014678016 - 1 - INFO - Skipped 0 utterances. 2024-02-24 16:59:58,193 - kalpy.fmllr - 12014678016 - 1 - INFO - Overall fMLLR auxf impr per frame is 11.319879726637765 over 6228.0 frames.

  6. 2024-02-24 17:00:01,075 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 1 2024-02-24 17:00:01,079 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 2 2024-02-24 17:00:01,081 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 4 2024-02-24 17:00:01,084 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 5 2024-02-24 17:00:01,086 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 6 2024-02-24 17:00:01,088 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 7 2024-02-24 17:00:01,090 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 8 2024-02-24 17:00:01,092 - kalpy.align - 12014678016 - 1 - DEBUG - Processed 9 2024-02-24 17:00:01,092 - kalpy.align - 12014678016 - 1 - DEBUG - Finished ali second pass 2024-02-24 17:00:01,093 - kalpy.align - 12014678016 - 1 - DEBUG - Finished ali first pass 2024-02-24 17:00:01,093 - kalpy.align - 12014678016 - 1 - DEBUG - Finished extraction

  7. 2024-02-24 16:59:52,094 - kalpy.graphs - 11223986176 - 1 - DEBUG - Tree path: /Users//Documents/MFA/mfa_1/alignment/tree 2024-02-24 16:59:52,094 - kalpy.graphs - 11223986176 - 1 - DEBUG - Model path: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:53,566 - kalpy.graphs - 11223986176 - 1 - DEBUG - Set up took 1.4670000076293945 seconds 2024-02-24 16:59:53,566 - kalpy.graphs - 11223986176 - 1 - INFO - Compiling graphs for mandarin_mfa 2024-02-24 16:59:53,805 - kalpy.graphs - 11223986176 - 1 - INFO - Done 8 utterances, errors on 0. 2024-02-24 16:59:53,805 - kalpy.graphs - 11223986176 - 1 - DEBUG - Total compilation time: 1.7064459323883057 seconds

10., 2024-02-24 16:59:52,094 - kalpy.graphs - 11516538880 - 2 - DEBUG - Tree path: /Users//Documents/MFA/mfa_1/alignment/tree 2024-02-24 16:59:52,094 - kalpy.graphs - 11516538880 - 2 - DEBUG - Model path: /Users//Documents/MFA/mfa_1/alignment/final.alimdl 2024-02-24 16:59:53,848 - kalpy.graphs - 11516538880 - 2 - DEBUG - Set up took 0.9440591335296631 seconds 2024-02-24 16:59:53,849 - kalpy.graphs - 11516538880 - 2 - INFO - Compiling graphs for mandarin_mfa 2024-02-24 16:59:54,127 - kalpy.graphs - 11516538880 - 2 - INFO - Done 1 utterances, errors on 0. 2024-02-24 16:59:54,127 - kalpy.graphs - 11516538880 - 2 - DEBUG - Total compilation time: 1.222973108291626 seconds

nlp12 commented 4 months ago

Which version of MFA is this and what's the output of conda list? Can you try running mfa validate my_corpus mandarin_mfa --acoustic_model_path mandarin_mfa --clean instead? See https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/data_validation.html#cmdoption-mfa-validate-acoustic_model_path, also look at https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html for how to format your corpus so that it recognizes the correct speakers.

Please see above, thanks!

nlp12 commented 4 months ago

mfa validate my_corpus mandarin_mfa --acoustic_model_path mandarin_mfa --clean i

Which version of MFA is this and what's the output of conda list? Can you try running mfa validate my_corpus mandarin_mfa --acoustic_model_path mandarin_mfa --clean instead? See https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/data_validation.html#cmdoption-mfa-validate-acoustic_model_path, also look at https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html for how to format your corpus so that it recognizes the correct speakers.

So I went through and upgraded from the condo version, then updated MFA, reinstalled PyTorch and speech brain, inspected the acoustic and dictionary and then ran the validate but got a ‘fatal error’ (it did complete training and then start generating alignments before it crashed). So I tried what you said mfa validate my_corpus mandarin_mfa --acoustic_model_path mandarin_mfa --clean but got an ‘error in the run’ message. Please advise, thanks!

nlp12 commented 4 months ago

In need of assistance, please help!

nlp12 commented 4 months ago

Here is the Conda List you asked for before:

#

Name Version Build Channel

aom 3.8.1 h078ce10_0 conda-forge archspec 0.2.2 pyhd8ed1ab_0 conda-forge atk-1.0 2.38.0 hcb7b3dd_1 conda-forge audioread 3.0.1 py311h267d04e_1 conda-forge baumwelch 0.3.8 h2ffa867_0 conda-forge biopython 1.79 py311he2be06e_3 conda-forge boltons 23.1.1 pyhd8ed1ab_0 conda-forge brotli 1.1.0 hb547adb_1 conda-forge brotli-bin 1.1.0 hb547adb_1 conda-forge brotli-python 1.1.0 py311ha891d26_1 conda-forge bzip2 1.0.8 h93a5062_5 conda-forge c-ares 1.27.0 h93a5062_0 conda-forge ca-certificates 2024.2.2 hf0a4a13_0 conda-forge cairo 1.18.0 hd1e100b_0 conda-forge certifi 2024.2.2 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py311h4a08483_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge conda 24.1.2 py311h267d04e_0 conda-forge conda-content-trust 0.2.0 py311hca03da5_0
conda-libmamba-solver 23.12.0 pyhd3eb1b0_1
conda-package-handling 2.2.0 pyh38be061_0 conda-forge conda-package-streaming 0.9.0 pyhd8ed1ab_0 conda-forge contourpy 1.2.0 py311hd03642b_0 conda-forge cryptography 42.0.5 py311h71175c2_0 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge cython 3.0.8 py311h92babd0_0 conda-forge dataclassy 1.0.1 pyhd8ed1ab_0 conda-forge dav1d 1.2.1 hb547adb_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge distro 1.9.0 pyhd8ed1ab_0 conda-forge expat 2.5.0 hb7217d7_1 conda-forge ffmpeg 6.1.1 gpl_h31ea89b_104 conda-forge fmt 10.2.1 h2ffa867_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_1 conda-forge fontconfig 2.14.2 h82840c6_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.48.1 py311h05b510d_0 conda-forge freetype 2.12.1 hadb7bae_2 conda-forge fribidi 1.0.10 h27ca646_0 conda-forge gdk-pixbuf 2.42.10 h15fa40c_4 conda-forge gettext 0.21.1 h0186832_0 conda-forge giflib 5.2.1 h1a8c8d9_3 conda-forge gmp 6.3.0 h965bd2d_0 conda-forge gnutls 3.7.9 hd26332c_0 conda-forge graphite2 1.3.13 h9f76cd9_1001 conda-forge graphviz 9.0.0 h3face73_1 conda-forge greenlet 3.0.3 py311h92babd0_0 conda-forge gtk2 2.24.33 h7895bb2_3 conda-forge gts 0.7.6 he42f4ea_4 conda-forge harfbuzz 8.3.0 h8f0ba13_0 conda-forge hdbscan 0.8.33 py311h9ea6feb_4 conda-forge icu 73.2 hc8870d7_0 conda-forge idna 3.6 pyhd8ed1ab_0 conda-forge joblib 1.3.2 pyhd8ed1ab_0 conda-forge jsonpatch 1.33 pyhd8ed1ab_0 conda-forge jsonpointer 2.4 py311h267d04e_3 conda-forge kaldi 5.5.1112 cpu_h012d200_0 conda-forge kalpy 0.6.2 py311h92babd0_0 conda-forge kiwisolver 1.4.5 py311he4fd1f5_1 conda-forge kneed 0.8.5 pyhd8ed1ab_0 conda-forge krb5 1.21.2 h92f50d5_0 conda-forge lame 3.100 h1a8c8d9_1003 conda-forge lazy_loader 0.3 pyhd8ed1ab_0 conda-forge lcms2 2.16 ha0e7c42_0 conda-forge lerc 4.0.0 h9a09cb3_0 conda-forge libabseil 20240116.1 cxx17_hebf3989_1 conda-forge libarchive 3.7.2 hcacb583_1 conda-forge libass 0.17.1 hf7da4fe_1 conda-forge libblas 3.9.0 21_osxarm64_openblas conda-forge libbrotlicommon 1.1.0 hb547adb_1 conda-forge libbrotlidec 1.1.0 hb547adb_1 conda-forge libbrotlienc 1.1.0 hb547adb_1 conda-forge libcblas 3.9.0 21_osxarm64_openblas conda-forge libcurl 8.5.0 h2d989ff_0 conda-forge libcxx 16.0.6 h4653b0c_0 conda-forge libdeflate 1.19 hb547adb_0 conda-forge libedit 3.1.20191231 hc8eb9b7_2 conda-forge libev 4.33 h93a5062_2 conda-forge libexpat 2.5.0 hb7217d7_1 conda-forge libffi 3.4.2 h3422bc3_5 conda-forge libflac 1.4.3 hb765f3a_0 conda-forge libgd 2.3.3 hfdf3952_9 conda-forge libgfortran 5.0.0 13_2_0_hd922786_3 conda-forge libgfortran5 13.2.0 hf226fd6_3 conda-forge libglib 2.78.4 h1635a5e_0 conda-forge libhwloc 2.9.3 default_h4394839_1009 conda-forge libiconv 1.17 h0d3ecfb_2 conda-forge libidn2 2.3.7 h93a5062_0 conda-forge libjpeg-turbo 3.0.0 hb547adb_1 conda-forge liblapack 3.9.0 21_osxarm64_openblas conda-forge liblapacke 3.9.0 21_osxarm64_openblas conda-forge libllvm14 14.0.6 hd1a9a77_4 conda-forge libmamba 1.5.6 h90c426b_0 conda-forge libmambapy 1.5.6 py311h26e1311_0 conda-forge libnghttp2 1.58.0 ha4dd798_1 conda-forge libogg 1.3.4 h27ca646_1 conda-forge libopenblas 0.3.26 openmp_h6c19121_0 conda-forge libopenvino 2023.3.0 he6dadac_2 conda-forge libopenvino-arm-cpu-plugin 2023.3.0 he6dadac_2 conda-forge libopenvino-auto-batch-plugin 2023.3.0 hc9f00d9_2 conda-forge libopenvino-auto-plugin 2023.3.0 hc9f00d9_2 conda-forge libopenvino-hetero-plugin 2023.3.0 hf483cef_2 conda-forge libopenvino-ir-frontend 2023.3.0 hf483cef_2 conda-forge libopenvino-onnx-frontend 2023.3.0 h9363200_2 conda-forge libopenvino-paddle-frontend 2023.3.0 h9363200_2 conda-forge libopenvino-pytorch-frontend 2023.3.0 hebf3989_2 conda-forge libopenvino-tensorflow-frontend 2023.3.0 h64b43cf_2 conda-forge libopenvino-tensorflow-lite-frontend 2023.3.0 hebf3989_2 conda-forge libopus 1.3.1 h27ca646_1 conda-forge libpng 1.6.43 h091b4b1_0 conda-forge libpq 16.2 h0f8b458_0 conda-forge libprotobuf 4.25.2 hbfab5d5_1 conda-forge librosa 0.10.1 pyhd8ed1ab_0 conda-forge librsvg 2.56.3 h55a2576_1 conda-forge libsndfile 1.2.2 h9739721_1 conda-forge libsolv 0.7.28 h1059232_0 conda-forge libsqlite 3.45.1 h091b4b1_0 conda-forge libssh2 1.11.0 h7a5bd25_0 conda-forge libtasn1 4.19.0 h1a8c8d9_0 conda-forge libtiff 4.6.0 ha8a6c65_2 conda-forge libunistring 0.9.10 h3422bc3_0 conda-forge libvorbis 1.3.7 h9f76cd9_0 conda-forge libvpx 1.13.1 hb765f3a_0 conda-forge libwebp 1.3.2 hf30222e_1 conda-forge libwebp-base 1.3.2 hb547adb_0 conda-forge libxcb 1.15 hf346824_0 conda-forge libxml2 2.12.5 h0d0cfa8_0 conda-forge libzlib 1.2.13 h53f4e23_5 conda-forge llvm-openmp 17.0.6 hcd81f8e_0 conda-forge llvmlite 0.42.0 py311hf5d242d_1 conda-forge lz4-c 1.9.4 hb7217d7_0 conda-forge lzo 2.10 h642e427_1000 conda-forge mad 0.15.1b hbdafb3b_1 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge matplotlib-base 3.8.3 py311hb58f1d1_0 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge menuinst 2.0.1 py311hca03da5_1
montreal-forced-aligner 3.0.0rc1 pyhd8ed1ab_0 conda-forge mpg123 1.32.4 hebf3989_0 conda-forge msgpack-python 1.0.7 py311hd03642b_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge ncurses 6.4 h463b476_2 conda-forge nettle 3.9.1 h40ed0f5_0 conda-forge ngram 1.3.15 h2ffa867_1 conda-forge numba 0.59.0 py311h00351ea_1 conda-forge numpy 1.26.4 py311h7125741_0 conda-forge openfst 1.8.3 h2ffa867_1 conda-forge openh264 2.4.1 hebf3989_0 conda-forge openjpeg 2.5.0 h4c1507b_3 conda-forge openssl 3.2.1 h0d3ecfb_0 conda-forge p11-kit 0.24.1 h29577a5_0 conda-forge packaging 23.2 pyhd8ed1ab_0 conda-forge pango 1.50.14 hcf40dda_2 conda-forge pcre2 10.42 h26f9a81_0 conda-forge pgvector 0.6.0 h88c5ba3_0 conda-forge pgvector-python 0.2.5 pyhe093146_0 conda-forge pillow 10.2.0 py311hb9c5795_0 conda-forge pip 23.3.1 py311hca03da5_0
pixman 0.43.2 hebf3989_0 conda-forge platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge pluggy 1.4.0 pyhd8ed1ab_0 conda-forge pooch 1.8.1 pyhd8ed1ab_0 conda-forge postgresql 16.2 h1d0603d_0 conda-forge praatio 6.0.0 pyhd8ed1ab_0 conda-forge psycopg2 2.9.9 py311h589e011_0 conda-forge pthread-stubs 0.4 h27ca646_1001 conda-forge pugixml 1.14 h13dd4ca_0 conda-forge pybind11-abi 4 hd8ed1ab_3 conda-forge pycosat 0.6.6 py311heffc1b2_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pygments 2.17.2 pyhd8ed1ab_0 conda-forge pynini 2.1.6 py311hcc98501_0 conda-forge pyparsing 3.1.1 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pysoundfile 0.12.1 pyhd8ed1ab_0 conda-forge python 3.11.8 hdf0ec26_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python.app 3 py311h80987f9_0
python_abi 3.11 4_cp311 conda-forge pyyaml 6.0.1 py311heffc1b2_1 conda-forge readline 8.2 h92ec313_1 conda-forge reproc 14.2.4.post0 h93a5062_1 conda-forge reproc-cpp 14.2.4.post0 h965bd2d_1 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge rich 13.7.0 pyhd8ed1ab_0 conda-forge rich-click 1.7.3 pyhd8ed1ab_0 conda-forge ruamel.yaml 0.18.6 py311h05b510d_0 conda-forge ruamel.yaml.clib 0.2.8 py311h05b510d_0 conda-forge scikit-learn 1.2.2 py311hf0b18b8_2 conda-forge scipy 1.12.0 py311h4f9446f_2 conda-forge setuptools 68.2.2 py311hca03da5_0
six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.10 h17c5cce_0 conda-forge sox 14.4.2 h2353817_1018 conda-forge soxr 0.1.3 h5008568_3 conda-forge soxr-python 0.3.7 py311h9ea6feb_0 conda-forge sqlalchemy 2.0.27 py311h05b510d_0 conda-forge sqlite 3.45.1 hf2abe2d_0 conda-forge svt-av1 1.8.0 h463b476_0 conda-forge tbb 2021.11.0 h2ffa867_1 conda-forge threadpoolctl 3.3.0 pyhc1e730c_0 conda-forge tk 8.6.13 h5083fa2_1 conda-forge tqdm 4.66.2 pyhd8ed1ab_0 conda-forge truststore 0.8.0 pyhd8ed1ab_0 conda-forge typing-extensions 4.9.0 hd8ed1ab_0 conda-forge typing_extensions 4.9.0 pyha770c72_0 conda-forge tzcode 2024a h93a5062_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge urllib3 2.2.1 pyhd8ed1ab_0 conda-forge wheel 0.41.2 py311hca03da5_0
x264 1!164.3095 h57fd34a_2 conda-forge x265 3.5 hbc6ce65_3 conda-forge xorg-libxau 1.0.11 hb547adb_0 conda-forge xorg-libxdmcp 1.1.3 h27ca646_0 conda-forge xz 5.2.6 h57fd34a_0 conda-forge yaml 0.2.5 h3422bc3_2 conda-forge yaml-cpp 0.8.0 h13dd4ca_0 conda-forge zlib 1.2.13 h53f4e23_5 conda-forge zstandard 0.22.0 py311h67b91a1_0 conda-forge zstd 1.5.5 h4f39d0f_0 conda-forge

mmcauliffe commented 4 months ago

Can you update mfa via conda update montreal-forced-aligner (3.0.0rc2 fixed a TextGrid export bug) and then run mfa align my_corpus mandarin_mfa mandarin_mfa output --clean and see what gets exported in output?

You can also redownload the mandarin models since I've retrained them via:

  1. mfa model download acoustic mandarin_mfa --ignore_cache
  2. mfa model download dictionary mandarin_china_mfa --ignore_cache (or mandarin_taiwan_mfa for traditional characters instead of simplified)
  3. mfa align my_corpus mandarin_china_mfa mandarin_mfa output --clean

I'm not seeing anything in the validation run that would cause errors with alignment, but the validate is only useful if you're having errors in running mfa align.

nlp12 commented 4 months ago

Can you update mfa via conda update montreal-forced-aligner (3.0.0rc2 fixed a TextGrid export bug) and then run mfa align my_corpus mandarin_mfa mandarin_mfa output --clean and see what gets exported in output?

You can also redownload the mandarin models since I've retrained them via:

  1. mfa model download acoustic mandarin_mfa --ignore_cache
  2. mfa model download dictionary mandarin_china_mfa --ignore_cache (or mandarin_taiwan_mfa for traditional characters instead of simplified)
  3. mfa align my_corpus mandarin_china_mfa mandarin_mfa output --clean

I'm not seeing anything in the validation run that would cause errors with alignment, but the validate is only useful if you're having errors in running mfa align.

FOLLOWED your instructions above and everything ran without any errors! HOWEVER 2 things did not load ton 100% and it gave me a warning. Now I need to see where this MFA saved the new textgrids, and will let you know if there is anything odd going on there. Hoping this warning is just a warning and is nothing that I need to do. Thanks!:

Collecting phone and word alignments from alignment lattices...
33% ━━━━━━━━ 3/9 [ 0:00:01 < -:--:-- , ? it/s ] WARNING Alignment analysis not available without using postgresql
INFO Exporting alignment TextGrids to output...
11% ━━━━━━━━ 1/9 [ 0:00:08 < -:--:-- , ? it/s ] INFO Finished exporting TextGrids to output!

mmcauliffe commented 4 months ago

The warning shouldn't matter for your purposes, and the progress bars don't always run go 100%, especially when the step completes quickly, but if the 9 files don't have textgrids in the output directory from the above command, then that would indicate an error. I'm going to close this out since it doesn't sound like there's an issue any more, but feel free to create another issue if you run into something else.