MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.36k stars 249 forks source link

[BUG] Exported TextGrid files less than wav files #721

Open hirodeng opened 1 year ago

hirodeng commented 1 year ago

Debugging checklist

[Yes] Have you updated to latest MFA version? [Yes] Have you tried rerunning the command with the --clean flag?

Describe the issue Steps:

docker pull mmcauliffe/montreal-forced-aligner:v2.2.17
docker run -it -v /mydata:/data mmcauliffe/montreal-forced-aligner:v2.2.17
# try to train and align an Arabic corpus
mfa train --clean -j 50 --single_speaker /data/mgb2/segments/train_mer20/ /data/dict/arabic_mfa.dict /data/model/arabic_accoustic_model.zip --output_directory /data/mgb2/aligned/train_mer20/

Results: There was a Permission denied error when exporting:

INFO     Exporting sat_4_ali TextGrids to /data/mgb2/aligned/train_mer20...             
 ERROR    There was an error in the run, please see the log.  

PermissionError: [Errno 13] Permission denied: '/data/mgb2/aligned/train_mer20'

So I grant permission to the /data/ dir and rerun with:

mfa train --no_clean -j 50 --single_speaker /data/mgb2/segments/train_mer20/ /data/dict/arabic_mfa.dict /data/model/arabic_accoustic_model.zip --output_directory /data/mgb2/aligned/train_mer20/```               

INFO     Setting up corpus information...                                               
 INFO     Found 1 speaker across 352416 files, average number of utterances per speaker: 
          352416.0                                                                       
 INFO     Jobs already initialized.                                                      
 INFO     Text already normalized.                                                       
 INFO     Creating corpus split with features...                                         
  96% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 336,602/352,416  [ 0:00:10 < 0:00:01 , 51,389 it/s ]
 INFO     Features already generated.                                                    
 INFO     Filtering utterances for training...                                           
 INFO     Pronunciation probability estimation already done, loading saved               
          probabilities...                                                               
 INFO     Pronunciation probability estimation already done, loading saved               
          probabilities...                                                               
 INFO     Accumulating transition stats...                                               
  76% ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 268,022/352,416  [ 0:00:04 < 0:00:01 , 296,181 it/s ]
 INFO     Finished accumulating transition stats!                                        
 INFO     Beginning phone LM training...                                                 
 INFO     Collecting training data...                                                    
  78% ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 275,887/352,416  [ 0:00:09 < 0:00:03 , 35,194 it/s ]
 INFO     Training model...                                                              
 INFO     Completed training in 48.3777596950531 seconds!                                
 INFO     Saved model to /data/model/arabic_accoustic_model.zip                          
 WARNING  Alignment analysis not available without using postgresql                      
 INFO     Exporting sat_4_ali TextGrids to /data/mgb2/aligned/train_mer20...             
  99% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 348,647/352,416  [ 0:01:00 < 0:00:01 , 11,376 it/s ]
 INFO     Finished exporting TextGrids to /data/mgb2/aligned/train_mer20!                
 INFO     Done! Everything took 128.847 seconds                 

It seems successful. However, when I counted the number of textgrid files, it was only 279222, while the number of original wav files is 352416.

Could you tell me why there are missing textgrids? Thank you!

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Arabic
    • How many files/speakers? 1 speaker, 352416 wav files
    • Are you using lab files or TextGrid files for input? lab
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? yes, the arabic mfa dict
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

2023-11-27 07:18:20,714 - mfa - DEBUG - Skipping pronunciation_probabilities_2 alignments
2023-11-27 07:18:21,702 - mfa - DEBUG - Skipping sat_4 alignments
2023-11-27 07:18:21,702 - mfa - INFO - Accumulating transition stats...
2023-11-27 07:18:34,979 - mfa - DEBUG - Accumulating transition stats took 13.275 seconds
2023-11-27 07:18:34,979 - mfa - INFO - Finished accumulating transition stats!
2023-11-27 07:18:34,989 - mfa - INFO - Beginning phone LM training...
2023-11-27 07:18:34,989 - mfa - INFO - Collecting training data...
2023-11-27 07:18:45,278 - mfa - INFO - Training model...
2023-11-27 07:18:46,904 - mfa - INFO - Completed training in 49.68739151954651 seconds!
2023-11-27 07:18:52,293 - mfa - INFO - Saved model to /data/model/arabic_accoustic_model.zip
2023-11-27 07:18:52,299 - mfa - DEBUG - Skipping sat_4 alignments
2023-11-27 07:18:52,300 - mfa - WARNING - Alignment analysis not available without using postgresql
2023-11-27 07:18:52,302 - mfa - INFO - Exporting sat_4_ali TextGrids to /data/mgb2/aligned/train_mer20...
2023-11-27 07:18:52,303 - mfa - ERROR - There was an error in the run, please see the log.
2023-11-27 07:19:57,022 - mfa - DEBUG - Beginning run for train_mer20
2023-11-27 07:19:57,022 - mfa - DEBUG - Using "global" profile
2023-11-27 07:19:57,022 - mfa - DEBUG - Using multiprocessing with 50
2023-11-27 07:19:57,022 - mfa - DEBUG - Set up logger for MFA version: 2.2.18.dev0+gf8d678f.d20230820
2023-11-27 07:19:57,056 - mfa - DEBUG - Using UNKNOWN
2023-11-27 07:19:57,245 - mfa - DEBUG - Loaded dictionary in 0.189 seconds
2023-11-27 07:19:57,248 - mfa - INFO - Setting up corpus information...
2023-11-27 07:19:57,252 - mfa - DEBUG - Successfully loaded from temporary files
2023-11-27 07:19:57,269 - mfa - INFO - Found 1 speaker across 352416 files, average number of utterances per speaker: 352416.0
2023-11-27 07:19:57,270 - mfa - DEBUG - Loaded corpus in 0.024 seconds
2023-11-27 07:19:57,272 - mfa - INFO - Jobs already initialized.
2023-11-27 07:19:57,273 - mfa - DEBUG - Initialized jobs in 0.003 seconds
2023-11-27 07:19:57,273 - mfa - INFO - Text already normalized.
2023-11-27 07:19:57,635 - mfa - DEBUG - Wrote lexicon information in 0.361 seconds
2023-11-27 07:19:57,637 - mfa - INFO - Creating corpus split with features...
2023-11-27 07:20:08,391 - mfa - DEBUG - Created corpus split directory in 10.756 seconds
2023-11-27 07:20:08,399 - mfa - INFO - Features already generated.
2023-11-27 07:20:08,400 - mfa - DEBUG - Generated features in 0.008 seconds
2023-11-27 07:20:08,400 - mfa - DEBUG - Setting up corpus took 11.344 seconds
2023-11-27 07:20:08,414 - mfa - INFO - Filtering utterances for training...
2023-11-27 07:20:11,668 - mfa - DEBUG - Skipping monophone alignments
2023-11-27 07:20:11,708 - mfa - DEBUG - Skipping triphone alignments
2023-11-27 07:20:11,749 - mfa - DEBUG - Skipping lda alignments
2023-11-27 07:20:11,808 - mfa - DEBUG - Skipping sat alignments
2023-11-27 07:20:11,949 - mfa - DEBUG - Skipping sat_2 alignments
2023-11-27 07:20:11,952 - mfa - INFO - Pronunciation probability estimation already done, loading saved probabilities...
2023-11-27 07:20:23,176 - mfa - DEBUG - Skipping pronunciation_probabilities alignments
2023-11-27 07:20:23,513 - mfa - DEBUG - Skipping sat_3 alignments
2023-11-27 07:20:23,516 - mfa - INFO - Pronunciation probability estimation already done, loading saved probabilities...
2023-11-27 07:20:34,475 - mfa - DEBUG - Skipping pronunciation_probabilities_2 alignments
2023-11-27 07:20:35,498 - mfa - DEBUG - Skipping sat_4 alignments
2023-11-27 07:20:35,498 - mfa - INFO - Accumulating transition stats...
2023-11-27 07:20:48,506 - mfa - DEBUG - Accumulating transition stats took 13.007 seconds
2023-11-27 07:20:48,506 - mfa - INFO - Finished accumulating transition stats!
2023-11-27 07:20:48,516 - mfa - INFO - Beginning phone LM training...
2023-11-27 07:20:48,517 - mfa - INFO - Collecting training data...
2023-11-27 07:20:58,353 - mfa - INFO - Training model...
2023-11-27 07:21:00,021 - mfa - INFO - Completed training in 48.3777596950531 seconds!
2023-11-27 07:21:05,473 - mfa - INFO - Saved model to /data/model/arabic_accoustic_model.zip
2023-11-27 07:21:05,482 - mfa - DEBUG - Skipping sat_4 alignments
2023-11-27 07:21:05,482 - mfa - WARNING - Alignment analysis not available without using postgresql
2023-11-27 07:21:05,486 - mfa - INFO - Exporting sat_4_ali TextGrids to /data/mgb2/aligned/train_mer20...
2023-11-27 07:22:05,859 - mfa - INFO - Finished exporting TextGrids to /data/mgb2/aligned/train_mer20!
2023-11-27 07:22:05,862 - mfa - DEBUG - Exported TextGrids in a total of 60.374 seconds
2023-11-27 07:22:05,865 - mfa - INFO - Done! Everything took 128.847 seconds

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

William-N-Havard commented 1 year ago

Same here. Got 14 TextGrids instead of 2176

Paulmzr commented 9 months ago

same for me

jollyfish-cjy commented 2 months ago

Same here. First I have 63 files but just got 39 TextGrids. Then I tried using mfa for the rest 24 files again, then got 5 TextGrids.

jollyfish-cjy commented 2 months ago

Same here. First I have 63 files but just got 39 TextGrids. Then I tried using mfa for the rest 24 files again, then got 5 TextGrids.

I aligned the rest 19 files one by one. And changing beam works for me. --beam 60 works for most of my files, and --beam 100 --retry_beam 400 works for the rest.