MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 242 forks source link

[BUG] IndexError: list index out of range (Training a new acoustic model) #678

Closed MLo7Ghinsan closed 12 months ago

MLo7Ghinsan commented 12 months ago

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

command: mfa train --single_speaker --clean [corpus_dir] [custom_dict_path] [model_save_path]

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • Japanese audio files with corresponding textgrid
    • Initially 1 speaker, but counted as 2 since in the textgrid we have word-level and phoneme-level alignment
    • Using TextGrid as input
  2. Dictionary
    • Uses custom dictionary
    • PINYIN phoneset in the non-probalistic dictionary type

Log file Log from MFA/{corpus}/{corpus}/split40/log: cmvn.log

Desktop (please complete the following information):

Additional context The output from executed command INFO Setting up corpus information... INFO Loading corpus from source files... 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1,583/100 [ 0:00:14 < 0:00:00 , 830 it/s ] INFO Found 2 speakers across 1687 files, average number of utterances per speaker: 126525.0 INFO Initializing multiprocessing jobs... INFO Normalizing text... 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 251,962/253,050 [ 0:00:24 < 0:00:01 , 27,900 it/s ] INFO Creating corpus split for feature generation... 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 254,217/254,737 [ 0:00:24 < 0:00:01 , 25,909 it/s ] INFO Generating MFCCs... 73% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 185,041/253,050 [ 0:01:51 < 0:00:27 , 2,546 it/s ] INFO Calculating CMVN... INFO Generating final features... 69% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 174,806/253,050 [ 0:00:27 < 0:00:06 , 13,710 it/s ] WARNING There were 67937 utterances ignored due to an issue in feature generation, see the log file for full details or runmfa validate` on the corpus. INFO Creating corpus split with features... 72% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 183,224/253,050 [ 0:00:23 < 0:00:04 , 20,183 it/s ] INFO Filtering utterances for training... INFO Creating subset directory with 10000 utterances... 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/10,000 [ 0:00:01 < -:--:-- , ? it/s ] INFO Initializing training for monophone... ERROR There was an error in the run, please see the log. Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x0000021B5BC0C7D0>> Traceback (most recent call last): File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\command_line\mfa.py", line 98, in history_save_handler raise self.exception File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\19315\anaconda3\envs\aligner\Scripts\mfa.exe__main.py", line 7, in File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\core.py", line 1157, in call__ return self.main(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\rich_click\rich_group.py", line 21, in main rv = super().main(args, standalone_mode=False, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\click\decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\command_line\train_acoustic_model.py", line 111, in train_acoustic_model_cli trainer.train() File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\acoustic_modeling\trainer.py", line 561, in train trainer.train() File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\acoustic_modeling\base.py", line 488, in train self.initialize_training() File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\acoustic_modeling\base.py", line 239, in initialize_training self._trainer_initialization() File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\acoustic_modeling\monophone.py", line 329, in _trainer_initialization feat_dim = self.worker.get_feat_dim() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\19315\anaconda3\envs\aligner\Lib\site-packages\montreal_forced_aligner\corpus\acoustic_corpus.py", line 917, in get_feat_dim job = self.jobs[0]


IndexError: list index out of range`

Audio context:
- There are no silence audio files
- Audios are in .wav format
- Audio properties: 16kHz, 16bit

Other failed attempts:
- executed command without --single_speaker
- executed command with ----no_use_postgres
- executed command with --no_use_mp
- same issue arise with `mfa validate` command

Test experiment:
- Used 5% of the files from the corpus, it worked, but it doesn't work on the full corpus
- MFFC and other followed features stopped generating after about ~75% of the files processed on the full corpus

Could it be because of the corpus I'm using? It seems to refuse to generate features after a while, here's an example of a TextGrid file.
TextGrid files used are all converted files from htk label (lab) file
![Screenshot 2023-08-19 205249](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/assets/98134320/79e6769b-c7f4-4691-b680-4666d127504b)
(the invisible phonemes are there, praat just like to hide them)
MLo7Ghinsan commented 12 months ago

Turns out that i was mistaken of how my input training file should be, whoops.