MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG] align command runs, no error, but no output #537

Closed jasonppy closed 1 year ago

jasonppy commented 1 year ago

Debugging checklist

[ ] Have you updated to latest MFA version? 2.0.6 [ ] Have you tried rerunning the command with the --clean flag? yes

Describe the issue I ran mfa align -s 13 --clean --output_format csv /data/scratch/datasets/sym_wavs2 english_us_arpa english_us_arpa ~/shoulder/mfa_phone

The program seems to finish successfully, no error, but also no output in ~/shoulder/mfa_phone

Since I don't know what .lab or .TextGrid format is, I used .txt as the transcript format, where each .txt file contains one line of space separated text.

Thank you very much for helping me out!

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers?
    • Are you using lab files or TextGrid files for input? I used .txt, each file contains a line of text, each word is separated by space
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? english_us_arpa
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_us_arpa
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). in the generated ~Documents/MFA/sym_wavs2_pretrained_aligner, pretrained_aligner.log reads:

2023-01-04 15:59:09,582 - sym_wavs2_pretrained_aligner - DEBUG - Beginning run for pretrained_aligner on sym_wavs2
2023-01-04 15:59:09,582 - sym_wavs2_pretrained_aligner - DEBUG - Using multiprocessing with 3
2023-01-04 15:59:09,582 - sym_wavs2_pretrained_aligner - DEBUG - Set up logger for MFA version: 2.0.6
2023-01-04 15:59:09,582 - sym_wavs2_pretrained_aligner - DEBUG - Cleaned previous run
2023-01-04 15:59:09,583 - sym_wavs2_pretrained_aligner - DEBUG - There were some differences in the current run compared to the last one. This may cause issues, run with --clean, if you hit an error.
2023-01-04 15:59:09,718 - sym_wavs2_pretrained_aligner - DEBUG - Using ARPA
2023-01-04 15:59:16,231 - sym_wavs2_pretrained_aligner - DEBUG - Loaded dictionary in 6.514206171035767
2023-01-04 15:59:21,184 - sym_wavs2_pretrained_aligner - DEBUG - Wrote lexicon information in 4.952595233917236
2023-01-04 15:59:21,195 - sym_wavs2_pretrained_aligner - INFO - Setting up corpus information...
2023-01-04 15:59:21,195 - sym_wavs2_pretrained_aligner - DEBUG - Could not load from temp
2023-01-04 15:59:21,195 - sym_wavs2_pretrained_aligner - INFO - Loading corpus from source files...
2023-01-04 15:59:22,239 - sym_wavs2_pretrained_aligner - DEBUG - Processing queue: 0.04249545699999935
2023-01-04 15:59:22,257 - sym_wavs2_pretrained_aligner - DEBUG - Parsed corpus directory with 3 jobs in 0.059739399999999776 seconds
2023-01-04 15:59:22,268 - sym_wavs2_pretrained_aligner - INFO - Found 11 speakers across 12 files, average number of utterances per speaker: 1.0909090909090908
2023-01-04 15:59:22,268 - sym_wavs2_pretrained_aligner - DEBUG - Loaded corpus in 1.0839776992797852
2023-01-04 15:59:22,268 - sym_wavs2_pretrained_aligner - INFO - Initializing multiprocessing jobs...
2023-01-04 15:59:22,277 - sym_wavs2_pretrained_aligner - DEBUG - Initialized jobs in 0.008713245391845703
2023-01-04 15:59:22,277 - sym_wavs2_pretrained_aligner - INFO - Creating corpus split for feature generation...
2023-01-04 15:59:22,283 - sym_wavs2_pretrained_aligner - DEBUG - Created corpus split directory in 0.006060600280761719
2023-01-04 15:59:22,283 - sym_wavs2_pretrained_aligner - INFO - Generating base features (mfcc)...
2023-01-04 15:59:22,283 - sym_wavs2_pretrained_aligner - INFO - Generating MFCCs...
2023-01-04 15:59:23,322 - sym_wavs2_pretrained_aligner - INFO - Calculating CMVN...
2023-01-04 15:59:23,371 - sym_wavs2_pretrained_aligner - INFO - Creating corpus split with features...
2023-01-04 15:59:23,382 - sym_wavs2_pretrained_aligner - DEBUG - Generated features in 1.0992753505706787
2023-01-04 15:59:23,388 - sym_wavs2_pretrained_aligner - DEBUG - Calculated oovs found in 0.00605320930480957
2023-01-04 15:59:23,388 - sym_wavs2_pretrained_aligner - DEBUG - Setting up corpus took 13.67208218574524 seconds
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - 
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - ====ACOUSTIC MODEL INFO====
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - Acoustic model root directory: /home/pyp/Documents/MFA/extracted_models/acoustic
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - Acoustic model dirname: /home/pyp/Documents/MFA/extracted_models/acoustic/english_us_arpa_acoustic
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - Acoustic model meta path: /home/pyp/Documents/MFA/extracted_models/acoustic/english_us_arpa_acoustic/meta.json
2023-01-04 15:59:23,389 - sym_wavs2_pretrained_aligner - DEBUG - Acoustic model meta information:
2023-01-04 15:59:23,394 - sym_wavs2_pretrained_aligner - DEBUG - architecture: gmm-hmm
features:
  allow_downsample: true
  allow_upsample: true
  delta_pitch: 0.005
  feature_type: mfcc
  frame_length: 25
  frame_shift: 10
  high_frequency: 7800
  low_frequency: 20
  max_f0: 500
  min_f0: 50
  penalty_factor: 0.1
  sample_frequency: 16000
  snip_edges: true
  use_energy: false
  use_pitch: false
  uses_cmvn: true
  uses_deltas: false
  uses_speaker_adaptation: true
  uses_splices: true
  uses_voiced: false
final_non_silence_correction: 0.01
final_silence_correction: 2.52
initial_silence_probability: 0.17
oov_phone: spn
optional_silence_phone: sil
phone_set_type: ARPA
phone_type: triphone
phones: !!set
  AA: null
  AA0: null
  AA1: null
  AA2: null
  AE: null
  AE0: null
  AE1: null
  AE2: null
  AH: null
  AH0: null
  AH1: null
  AH2: null
  AO: null
  AO0: null
  AO1: null
  AO2: null
  AW: null
  AW0: null
  AW1: null
  AW2: null
  AY: null
  AY0: null
  AY1: null
  AY2: null
  B: null
  CH: null
  D: null
  DH: null
  EH: null
  EH0: null
  EH1: null
  EH2: null
  ER: null
  ER0: null
  ER1: null
  ER2: null
  EY: null
  EY0: null
  EY1: null
  EY2: null
  F: null
  G: null
  HH: null
  IH: null
  IH0: null
  IH1: null
  IH2: null
  IY: null
  IY0: null
  IY1: null
  IY2: null
  JH: null
  K: null
  L: null
  M: null
  N: null
  NG: null
  OW: null
  OW0: null
  OW1: null
  OW2: null
  OY: null
  OY0: null
  OY1: null
  OY2: null
  P: null
  R: null
  S: null
  SH: null
  T: null
  TH: null
  UH: null
  UH0: null
  UH1: null
  UH2: null
  UW: null
  UW0: null
  UW1: null
  UW2: null
  V: null
  W: null
  Y: null
  Z: null
  ZH: null
silence_probability: 0.17479931715231248
train_date: '2022-05-11 11:19:39.314871'
training:
  audio_duration: 3535789.7961249864
  average_log_likelihood: -0.005016351709420847
  num_oovs: 0
  num_speakers: 2484
  num_utterances: 292326
version: 2.0.0rc4.dev19+ged818cb.d20220404

2023-01-04 15:59:23,395 - sym_wavs2_pretrained_aligner - DEBUG - 
2023-01-04 15:59:23,395 - sym_wavs2_pretrained_aligner - DEBUG - Setup for alignment in 13.811795234680176 seconds
2023-01-04 15:59:23,395 - sym_wavs2_pretrained_aligner - INFO - Compiling training graphs...
2023-01-04 15:59:25,019 - sym_wavs2_pretrained_aligner - DEBUG - Compiling training graphs took 1.6239326000213623
2023-01-04 15:59:25,020 - sym_wavs2_pretrained_aligner - INFO - Performing first-pass alignment...
2023-01-04 15:59:25,020 - sym_wavs2_pretrained_aligner - INFO - Generating alignments...
2023-01-04 15:59:26,426 - sym_wavs2_pretrained_aligner - DEBUG - Alignment round took 1.4058709144592285
2023-01-04 15:59:28,476 - sym_wavs2_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -71.0255
2023-01-04 15:59:28,476 - sym_wavs2_pretrained_aligner - DEBUG - Compiling information took 2.0486886501312256
2023-01-04 15:59:28,476 - sym_wavs2_pretrained_aligner - INFO - Calculating fMLLR for speaker adaptation...
2023-01-04 15:59:29,753 - sym_wavs2_pretrained_aligner - DEBUG - Fmllr calculation took 1.276536226272583
2023-01-04 15:59:29,754 - sym_wavs2_pretrained_aligner - INFO - Performing second-pass alignment...
2023-01-04 15:59:29,755 - sym_wavs2_pretrained_aligner - INFO - Generating alignments...
2023-01-04 15:59:31,164 - sym_wavs2_pretrained_aligner - DEBUG - Alignment round took 1.408902645111084
2023-01-04 15:59:33,203 - sym_wavs2_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -71.0255
2023-01-04 15:59:33,203 - sym_wavs2_pretrained_aligner - DEBUG - Compiling information took 2.038318157196045
2023-01-04 15:59:33,203 - sym_wavs2_pretrained_aligner - INFO - Exporting TextGrids to /home/pyp/shoulder/mfa_phone...
2023-01-04 15:59:33,204 - sym_wavs2_pretrained_aligner - INFO - Collecting phone and word alignments from alignment lattices...
2023-01-04 15:59:38,137 - sym_wavs2_pretrained_aligner - INFO - Finished exporting TextGrids to /home/pyp/shoulder/mfa_phone!
2023-01-04 15:59:38,137 - sym_wavs2_pretrained_aligner - DEBUG - Exported TextGrids in a total of 2.060800075531006 seconds
2023-01-04 15:59:38,139 - sym_wavs2_pretrained_aligner - INFO - Done! Everything took 28.55681324005127 seconds

Desktop (please complete the following information):

mmcauliffe commented 1 year ago

Can you try rerunning with --overwrite?

mfa align -s 13 --clean --overwrite --output_format csv /data/scratch/datasets/sym_wavs2 english_us_arpa english_us_arpa ~/shoulder/mfa_phone

Alternatively try with a different output format? There might be an issue with csv, but omitting or using json should work.

jasonppy commented 1 year ago

Found the error! The .txt filename is slightly different from .wav filename

WangHelin1997 commented 1 year ago

Found the error! The .txt filename is slightly different from .wav filename

Hi. I have the same problem. What does .txt filename represent for?

jasonppy commented 1 year ago

Found the error! The .txt filename is slightly different from .wav filename

Hi. I have the same problem. What does .txt filename represent for?

The transcript