MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 249 forks source link

[BUG] No error but no TextGrids output #542

Open WangHelin1997 opened 1 year ago

WangHelin1997 commented 1 year ago

Debugging checklist

[ ] Have you updated to latest MFA version? Yes [ ] Have you tried rerunning the command with the --clean flag? Yes

Describe the issue A clear and concise description of what the bug is.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 28 speakers, 21379 files
    • Are you using lab files or TextGrid files for input? No
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? english_mfa
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_mfa
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

2023-02-01 18:49:05,820 - processed_data_pretrained_aligner - DEBUG - Beginning run for pretrained_aligner on processed_data
2023-02-01 18:49:05,821 - processed_data_pretrained_aligner - DEBUG - Using multiprocessing with 10
2023-02-01 18:49:05,821 - processed_data_pretrained_aligner - DEBUG - Set up logger for MFA version: 2.0.6
2023-02-01 18:49:05,821 - processed_data_pretrained_aligner - DEBUG - Cleaned previous run
2023-02-01 18:49:05,821 - processed_data_pretrained_aligner - DEBUG - There were some differences in the current run compared to the last one. This may cause issues, run with --clean, if you hit an error.
2023-02-01 18:49:06,774 - processed_data_pretrained_aligner - DEBUG - Using ARPA
2023-02-01 18:49:06,880 - processed_data_pretrained_aligner - DEBUG - Loaded dictionary in 0.11072373390197754
2023-02-01 18:49:06,939 - processed_data_pretrained_aligner - DEBUG - Wrote lexicon information in 0.05958437919616699
2023-02-01 18:49:06,959 - processed_data_pretrained_aligner - INFO - Setting up corpus information...
2023-02-01 18:49:06,959 - processed_data_pretrained_aligner - DEBUG - Could not load from temp
2023-02-01 18:49:06,959 - processed_data_pretrained_aligner - INFO - Loading corpus from source files...
2023-02-01 18:49:09,689 - processed_data_pretrained_aligner - DEBUG - Processing queue: 1.235656907000001
2023-02-01 18:49:10,261 - processed_data_pretrained_aligner - DEBUG - Parsed corpus directory with 10 jobs in 1.7814550990000004 seconds
2023-02-01 18:49:10,318 - processed_data_pretrained_aligner - INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
2023-02-01 18:49:10,318 - processed_data_pretrained_aligner - DEBUG - Loaded corpus in 3.3784103393554688
2023-02-01 18:49:10,318 - processed_data_pretrained_aligner - INFO - Initializing multiprocessing jobs...
2023-02-01 18:49:10,349 - processed_data_pretrained_aligner - DEBUG - Initialized jobs in 0.03097057342529297
2023-02-01 18:49:10,350 - processed_data_pretrained_aligner - INFO - Creating corpus split for feature generation...
2023-02-01 18:49:10,632 - processed_data_pretrained_aligner - DEBUG - Created corpus split directory in 0.28189659118652344
2023-02-01 18:49:10,632 - processed_data_pretrained_aligner - INFO - Generating base features (mfcc)...
2023-02-01 18:49:10,632 - processed_data_pretrained_aligner - INFO - Generating MFCCs...
2023-02-01 18:49:57,000 - processed_data_pretrained_aligner - INFO - Calculating CMVN...
2023-02-01 18:49:57,603 - processed_data_pretrained_aligner - INFO - Creating corpus split with features...
2023-02-01 18:49:57,828 - processed_data_pretrained_aligner - DEBUG - Generated features in 47.195974349975586
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Calculated oovs found in 0.017041683197021484
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Setting up corpus took 51.076207876205444 seconds
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - WARNING - There were 463 pronunciations in the dictionary that were ignored for containing one of 32 phones not present in thetrained acoustic model.  Please run `mfa validate` to get more details.
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - 
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - ====ACOUSTIC MODEL INFO====
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Acoustic model root directory: /home/dean/Documents/MFA/extracted_models/acoustic
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Acoustic model dirname: /home/dean/Documents/MFA/extracted_models/acoustic/english_us_arpa_acoustic
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta path: /home/dean/Documents/MFA/extracted_models/acoustic/english_us_arpa_acoustic/meta.json
2023-02-01 18:49:57,845 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta information:
2023-02-01 18:49:57,852 - processed_data_pretrained_aligner - DEBUG - architecture: gmm-hmm
features:
  allow_downsample: true
  allow_upsample: true
  delta_pitch: 0.005
  feature_type: mfcc
  frame_length: 25
  frame_shift: 10
  high_frequency: 7800
  low_frequency: 20
  max_f0: 500
  min_f0: 50
  penalty_factor: 0.1
  sample_frequency: 16000
  snip_edges: true
  use_energy: false
  use_pitch: false
  uses_cmvn: true
  uses_deltas: false
  uses_speaker_adaptation: true
  uses_splices: true
  uses_voiced: false
final_non_silence_correction: 0.01
final_silence_correction: 2.52
initial_silence_probability: 0.17
oov_phone: spn
optional_silence_phone: sil
phone_set_type: ARPA
phone_type: triphone
phones: !!set
  AA: null
  AA0: null
  AA1: null
  AA2: null
  AE: null
  AE0: null
  AE1: null
  AE2: null
  AH: null
  AH0: null
  AH1: null
  AH2: null
  AO: null
  AO0: null
  AO1: null
  AO2: null
  AW: null
  AW0: null
  AW1: null
  AW2: null
  AY: null
  AY0: null
  AY1: null
  AY2: null
  B: null
  CH: null
  D: null
  DH: null
  EH: null
  EH0: null
  EH1: null
  EH2: null
  ER: null
  ER0: null
  ER1: null
  ER2: null
  EY: null
  EY0: null
  EY1: null
  EY2: null
  F: null
  G: null
  HH: null
  IH: null
  IH0: null
  IH1: null
  IH2: null
  IY: null
  IY0: null
  IY1: null
  IY2: null
  JH: null
  K: null
  L: null
  M: null
  N: null
  NG: null
  OW: null
  OW0: null
  OW1: null
  OW2: null
  OY: null
  OY0: null
  OY1: null
  OY2: null
  P: null
  R: null
  S: null
  SH: null
  T: null
  TH: null
  UH: null
  UH0: null
  UH1: null
  UH2: null
  UW: null
  UW0: null
  UW1: null
  UW2: null
  V: null
  W: null
  Y: null
  Z: null
  ZH: null
silence_probability: 0.17479931715231248
train_date: '2022-05-11 11:19:39.314871'
training:
  audio_duration: 3535789.7961249864
  average_log_likelihood: -0.005016351709420847
  num_oovs: 0
  num_speakers: 2484
  num_utterances: 292326
version: 2.0.0rc4.dev19+ged818cb.d20220404

2023-02-01 18:49:57,852 - processed_data_pretrained_aligner - DEBUG - 
2023-02-01 18:49:57,852 - processed_data_pretrained_aligner - DEBUG - Setup for alignment in 52.03099608421326 seconds
2023-02-01 18:49:57,852 - processed_data_pretrained_aligner - INFO - Compiling training graphs...
2023-02-01 18:49:59,340 - processed_data_pretrained_aligner - DEBUG - Compiling training graphs took 1.487360954284668
2023-02-01 18:49:59,342 - processed_data_pretrained_aligner - INFO - Performing first-pass alignment...
2023-02-01 18:49:59,342 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-01 18:50:07,588 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.245379447937012
2023-02-01 18:50:08,805 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -55.223416627788154
2023-02-01 18:50:08,806 - processed_data_pretrained_aligner - DEBUG - Compiling information took 1.212735652923584
2023-02-01 18:50:08,806 - processed_data_pretrained_aligner - INFO - Calculating fMLLR for speaker adaptation...
2023-02-01 18:50:10,528 - processed_data_pretrained_aligner - DEBUG - Fmllr calculation took 1.7222552299499512
2023-02-01 18:50:10,530 - processed_data_pretrained_aligner - INFO - Performing second-pass alignment...
2023-02-01 18:50:10,531 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-01 18:50:18,570 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.03863525390625
2023-02-01 18:50:19,807 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -55.223416627788154
2023-02-01 18:50:19,808 - processed_data_pretrained_aligner - DEBUG - Compiling information took 1.2347967624664307
2023-02-01 18:50:19,808 - processed_data_pretrained_aligner - INFO - Exporting TextGrids to /data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner/aligned...
2023-02-01 18:50:19,808 - processed_data_pretrained_aligner - INFO - Collecting phone and word alignments from alignment lattices...
2023-02-01 18:50:23,529 - processed_data_pretrained_aligner - INFO - Finished exporting TextGrids to /data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner/aligned!
2023-02-01 18:50:23,530 - processed_data_pretrained_aligner - DEBUG - Exported TextGrids in a total of 2.077840805053711 seconds
2023-02-01 18:50:23,532 - processed_data_pretrained_aligner - INFO - Done! Everything took 77.71208310127258 seconds

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

(mfa-dev) dean@inspur:/data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner$ mfa align -j 10 --clean --overwrite --output_format json /data/dean/whl-2022/Speech-Backbones/DiffVC/processed_data new_lexicon.txt english_us_arpa /data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner/aligned
INFO - Setting up corpus information...
INFO - Loading corpus from source files...
21379it [00:03, 6597.57it/s]
INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
INFO - Initializing multiprocessing jobs...
INFO - Creating corpus split for feature generation...
INFO - Generating base features (mfcc)...
INFO - Generating MFCCs...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 21350/21379 [00:45<00:00, 464.49it/s]
INFO - Calculating CMVN...
INFO - Creating corpus split with features...
WARNING - There were 463 pronunciations in the dictionary that were ignored for containing one of 32 phones not present
                   in thetrained acoustic model.  Please run `mfa validate` to get more details.
INFO - Compiling training graphs...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:01<00:00, 14388.38it/s]
INFO - Performing first-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2593.09it/s]
INFO - Calculating fMLLR for speaker adaptation...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:01<00:00, 16.28it/s]
INFO - Performing second-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2659.73it/s]
INFO - Exporting TextGrids to /data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner/aligned...
INFO - Collecting phone and word alignments from alignment lattices...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:01<00:00, 13032.04it/s]
  0%|                                                                                                                                                                                        | 0/21379 [00:02<?, ?it/s]
INFO - Finished exporting TextGrids to /data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner/aligned!
INFO - Done! Everything took 77.71208310127258 seconds
mmcauliffe commented 1 year ago

There is the warning

There were 463 pronunciations in the dictionary that were ignored for containing one of 32 phones not present
in thetrained acoustic model.  Please run `mfa validate` to get more details.

You say you're using english_mfa dictionary, but in the command it's using english_us_arpa, so depending on what's in the "new_lexicon.txt" dictionary, there might not be anything to really output? It should output something, but not very useful like lots of OOVs, but I would get the mismatch in dictionary/acoustic model fixed first.

WangHelin1997 commented 1 year ago

Hi! I have checked it and this is my second run output.

(mfa-dev) dean@inspur:/data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner$ mfa align -j 10 --clean /data/dean/whl-2022/Speech-Backbones/DiffVC/processed_data /home/dean/Documents/MFA/pretrained_models/dictionary/english_mfa.dict english_mfa /home/dean/Documents/MFA/aligned
INFO - Setting up corpus information...
INFO - Loading corpus from source files...
21379it [00:03, 6746.17it/s]
INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
INFO - Initializing multiprocessing jobs...
INFO - Creating corpus split for feature generation...
INFO - Generating base features (mfcc)...
INFO - Generating MFCCs...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 21350/21379 [00:48<00:00, 438.39it/s]
INFO - Calculating CMVN...
INFO - Creating corpus split with features...
INFO - Compiling training graphs...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:02<00:00, 10124.11it/s]
INFO - Performing first-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2400.72it/s]
INFO - Calculating fMLLR for speaker adaptation...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:01<00:00, 15.29it/s]
INFO - Performing second-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2386.32it/s]
INFO - Exporting TextGrids to /home/dean/Documents/MFA/aligned...
INFO - Collecting phone and word alignments from alignment lattices...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:03<00:00, 6936.33it/s]
  0%|                                                                                                                                                                                        | 0/21379 [00:02<?, ?it/s]
INFO - Finished exporting TextGrids to /home/dean/Documents/MFA/aligned!
INFO - Done! Everything took 97.89735674858093 seconds

It seems that everything works well except the exporting part. And here is the log file.

2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Beginning run for pretrained_aligner on processed_data
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Using multiprocessing with 10
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Set up logger for MFA version: 2.0.6
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Cleaned previous run
2023-02-02 01:08:05,525 - processed_data_pretrained_aligner - DEBUG - There were some differences in the current run compared to the last one. This may cause issues, run with --clean, if you hit an error.
2023-02-02 01:08:06,482 - processed_data_pretrained_aligner - DEBUG - Using IPA
2023-02-02 01:08:12,283 - processed_data_pretrained_aligner - DEBUG - Loaded dictionary in 5.805937767028809
2023-02-02 01:08:18,235 - processed_data_pretrained_aligner - DEBUG - Wrote lexicon information in 5.951487064361572
2023-02-02 01:08:18,255 - processed_data_pretrained_aligner - INFO - Setting up corpus information...
2023-02-02 01:08:18,255 - processed_data_pretrained_aligner - DEBUG - Could not load from temp
2023-02-02 01:08:18,256 - processed_data_pretrained_aligner - INFO - Loading corpus from source files...
2023-02-02 01:08:20,997 - processed_data_pretrained_aligner - DEBUG - Processing queue: 1.1797338810000007
2023-02-02 01:08:21,550 - processed_data_pretrained_aligner - DEBUG - Parsed corpus directory with 10 jobs in 1.7061748439999995 seconds
2023-02-02 01:08:21,587 - processed_data_pretrained_aligner - INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
2023-02-02 01:08:21,588 - processed_data_pretrained_aligner - DEBUG - Loaded corpus in 3.352215528488159
2023-02-02 01:08:21,588 - processed_data_pretrained_aligner - INFO - Initializing multiprocessing jobs...
2023-02-02 01:08:21,618 - processed_data_pretrained_aligner - DEBUG - Initialized jobs in 0.03055095672607422
2023-02-02 01:08:21,618 - processed_data_pretrained_aligner - INFO - Creating corpus split for feature generation...
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - DEBUG - Created corpus split directory in 0.1694188117980957
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - INFO - Generating base features (mfcc)...
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - INFO - Generating MFCCs...
2023-02-02 01:09:10,991 - processed_data_pretrained_aligner - INFO - Calculating CMVN...
2023-02-02 01:09:11,589 - processed_data_pretrained_aligner - INFO - Creating corpus split with features...
2023-02-02 01:09:11,832 - processed_data_pretrained_aligner - DEBUG - Generated features in 50.044026374816895
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Calculated oovs found in 0.018543004989624023
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Setting up corpus took 65.37393045425415 seconds
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - 
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - ====ACOUSTIC MODEL INFO====
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model root directory: /home/dean/Documents/MFA/extracted_models/acoustic
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model dirname: /home/dean/Documents/MFA/extracted_models/acoustic/english_mfa_acoustic
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta path: /home/dean/Documents/MFA/extracted_models/acoustic/english_mfa_acoustic/meta.json
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta information:
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG - architecture: gmm-hmm
features:
  allow_downsample: true
  allow_upsample: true
  delta_pitch: 0.005
  feature_type: mfcc
  frame_length: 25
  frame_shift: 10
  high_frequency: 7800
  low_frequency: 20
  max_f0: 500
  min_f0: 50
  penalty_factor: 0.1
  sample_frequency: 16000
  snip_edges: true
  use_energy: false
  use_pitch: true
  uses_cmvn: true
  uses_deltas: false
  uses_speaker_adaptation: true
  uses_splices: true
  uses_voiced: false
final_non_silence_correction: 0.19
final_silence_correction: 2.165
initial_silence_probability: 0.2
oov_phone: spn
optional_silence_phone: sil
phone_set_type: IPA
phone_type: triphone
phones: !!set
  a: null
  aj: null
  aw: null
  "a\u02D0": null
  b: null
  "b\u02B2": null
  c: null
  "c\u02B0": null
  d: null
  "d\u0292": null
  "d\u02B2": null
  "d\u032A": null
  e: null
  ej: null
  f: null
  "f\u02B2": null
  h: null
  i: null
  "i\u02D0": null
  j: null
  k: null
  "k\u02B0": null
  l: null
  m: null
  "m\u02B2": null
  "m\u0329": null
  n: null
  "n\u0329": null
  o: null
  ow: null
  p: null
  "p\u02B0": null
  "p\u02B2": null
  s: null
  t: null
  "t\u0283": null
  "t\u02B0": null
  "t\u02B2": null
  "t\u032A": null
  u: null
  "u\u02D0": null
  v: null
  "v\u02B2": null
  w: null
  z: null
  "\xE6": null
  "\xE7": null
  "\xF0": null
  "\u014B": null
  "\u0250": null
  "\u0251": null
  "\u0251\u02D0": null
  "\u0252": null
  "\u0252\u02D0": null
  "\u0254": null
  "\u0254j": null
  "\u0259": null
  "\u0259w": null
  "\u025A": null
  "\u025B": null
  "\u025B\u02D0": null
  "\u025C": null
  "\u025C\u02D0": null
  "\u025D": null
  "\u025F": null
  "\u0261": null
  "\u026A": null
  "\u026B": null
  "\u026B\u0329": null
  "\u0271": null
  "\u0272": null
  "\u0279": null
  "\u027E": null
  "\u027E\u02B2": null
  "\u0283": null
  "\u0289": null
  "\u0289\u02D0": null
  "\u028A": null
  "\u028E": null
  "\u0292": null
  "\u0294": null
  "\u03B8": null
silence_probability: 0.19976790028347458
train_date: '2022-05-14 11:48:51.797073'
training:
  audio_duration: 13146329.456449727
  average_log_likelihood: -0.01580790029624346
  num_oovs: 0
  num_speakers: 77861
  num_utterances: 2411758
version: 2.0.0rc4.dev19+ged818cb.d20220404

2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG - 
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG - Setup for alignment in 66.33342623710632 seconds
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - INFO - Compiling training graphs...
2023-02-02 01:09:13,972 - processed_data_pretrained_aligner - DEBUG - Compiling training graphs took 2.1131227016448975
2023-02-02 01:09:13,974 - processed_data_pretrained_aligner - INFO - Performing first-pass alignment...
2023-02-02 01:09:13,974 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-02 01:09:22,880 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.905574083328247
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -50.45720663988212
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - DEBUG - Compiling information took 2.249269723892212
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - INFO - Calculating fMLLR for speaker adaptation...
2023-02-02 01:09:26,971 - processed_data_pretrained_aligner - DEBUG - Fmllr calculation took 1.8337764739990234
2023-02-02 01:09:26,973 - processed_data_pretrained_aligner - INFO - Performing second-pass alignment...
2023-02-02 01:09:26,974 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-02 01:09:35,933 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.95933198928833
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -50.68226060948395
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - DEBUG - Compiling information took 2.264148712158203
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - INFO - Exporting TextGrids to /home/dean/Documents/MFA/aligned...
2023-02-02 01:09:38,204 - processed_data_pretrained_aligner - INFO - Collecting phone and word alignments from alignment lattices...
2023-02-02 01:09:43,418 - processed_data_pretrained_aligner - INFO - Finished exporting TextGrids to /home/dean/Documents/MFA/aligned!
2023-02-02 01:09:43,419 - processed_data_pretrained_aligner - DEBUG - Exported TextGrids in a total of 2.1298766136169434 seconds
2023-02-02 01:09:43,421 - processed_data_pretrained_aligner - INFO - Done! Everything took 97.89735674858093 seconds
mmcauliffe commented 1 year ago

Is there anything in "export_textgrids..log" files in ~/Documents/MFA/processed_data/alignment/log? I think* that would the directory structure, but I've been working with a new version that does things slightly differently so I'm a bit fuzzy on the exact location of the export textgrid logs.

WangHelin1997 commented 1 year ago

Hi. I didn't find the folder of ~/Documents/MFA/processed_data/alignment/

image
mmcauliffe commented 1 year ago

Oh under pretrained_aligner/log then, sorry!

WangHelin1997 commented 1 year ago

They are all

Done!
image
mmcauliffe commented 1 year ago

Well that logging is less than helpful. I'm very puzzled. And they didn't get stuck in pretrained_aligner/textgrids to avoid overwriting an existing folder?

I'm hoping to get 2.1 out tonight, which is outputting jsons correctly for me, so I'll let you know when that's live and you can try it out? (and hopefully it doesn't introduce other issues for you).

WangHelin1997 commented 1 year ago

Thanks so much. I will try the new version.

G1017 commented 1 year ago

请问您解决了吗?是因为录音时长太长了嘛?