MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 248 forks source link

Can't align using saved model #104

Open JoFrhwld opened 5 years ago

JoFrhwld commented 5 years ago

I've trained a model using mfa_train_and_align, and would like to reuse it. When I run the aligner like so

bin/mfa_align corpus/ dictionary model/model.zip output/

it successfully gets through set up, but then gives me the following error:

Setting up corpus information...
Number of speakers in corpus: 25, average number of utterances per speaker: 589.36
Creating dictionary information...
Setting up corpus_data directory...
Generating base features (mfcc)...
Calculating CMVN...
Done with setup.
Traceback (most recent call last):
  File "aligner/command_line/align.py", line 224, in <module>
  File "aligner/command_line/align.py", line 181, in validate_args
  File "aligner/command_line/align.py", line 141, in align_corpus
  File "aligner/aligner/pretrained.py", line 93, in align
  File "aligner/multiprocessing.py", line 322, in align
aligner.exceptions.AlignmentError: There were 3 job(s) with errors.  For more information, please see the following logs:

/Users/jofrhwld/Documents/MFA/edinburgh_to_align/align/log/align.final.0.log
/Users/jofrhwld/Documents/MFA/edinburgh_to_align/align/log/align.final.1.log
/Users/jofrhwld/Documents/MFA/edinburgh_to_align/align/log/align.final.2.log

When I look at the relevant log files, I think the relevant error is this:

ERROR (gmm-align-compiled[5.4.251~1-094d2]:LogLikelihoodZeroBased():decodable-am-diag-gmm.cc:50) Dim mismatch: data dim = 39 vs. model dim = 40

I found a related discussion on the Kaldi page here: https://sourceforge.net/p/kaldi/discussion/1355348/thread/028ffd7f/

The answer that seemed to fix things was:

Actually, since the data dim is 39 and the model dim is 40, I think it's most likely that he is using delta+accel features but the model expected LDA+MLLT features. Lucian, have a look at the decoding command line that was used when you decoded your original data where you trained the model. Most likely it will have a splicing step (splice-feats) followed by a projection step (transform-feats)- you should try to replicate that when you decode. If you are using a decoding script, it will automatically pick up the LDA+MLLT features if you copy the final.mat to the directory where your final.mdl exists. Also make sure to copy splice_opts and cmvn_opts to that directory.

Is there something that I can do either with the model I've trained, or in re-training to fix this problem? Here's my training config file.

beam: 10
retry_beam: 40

features:
  type: "mfcc"
  use_energy: false
  frame_shift: 10

training:
  - monophone:
      num_iterations: 40
      max_gaussians: 1000
      subset: 2000
      boost_silence: 1.25

  - triphone:
      num_iterations: 35
      num_leaves: 2000
      max_gaussians: 10000
      cluster_threshold: -1
      subset: 5000
      boost_silence: 1.25
      power: 0.25

  - lda:
      num_leaves: 2500
      max_gaussians: 15000
      subset: 10000
      num_iterations: 35
      features:
          splice_left_context: 3
          splice_right_context: 3

  - sat:
      num_leaves: 2500
      max_gaussians: 15000
      silence_weight: 0.0
      fmllr_update_type: "diag"
      subset: 30000
      features:
          lda: true

  - sat:
      calc_pron_probs: true
      num_leaves: 4200
      max_gaussians: 40000
      silence_weight: 0.0
      fmllr_update_type: "diag"
      subset: 30000
      features:
          lda: true
          fmllr: true
craigbaker commented 5 years ago

I'm hitting the same problem. I'm using the default configuration, rather than providing a .yaml at training time. I often use Kaldi for other tasks, but all I have to offer so far is the observation that the final alignment during training reads a different feature filename than a subsequent alignment from a saved model. The training log file sat2_ali/log/align.final.0.log has the command:

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/sdd/tmp/train/sat2/final.mdl" - |' ark:/sdd/tmp/train/sat2/fsts.0 scp:/sdd/tmp/train/corpus_data/split2/features_mfcc_cmvn_lda_fmllr.0.scp ark:/sdd/tmp/train/sat2_ali/ali.0 ark,t:/sdd/tmp/train/sat2_ali/ali.0.scores

while the alignment log log/align.final.0.log has:

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/sdd/tmp/dev/align/final.mdl" - |' ark:/sdd/tmp/dev/align/fsts.0 scp:/sdd/tmp/dev/corpus_data/split2/features_mfcc_cmvn_deltas.0.scp ark:/sdd/tmp/dev/align/ali.0 ark,t:/sdd/tmp/dev/align/ali.0.scores

Note the different input filenames features_mfcc_cmvn_lda_fmllr vs. features_mfcc_cmvn_deltas. The alignment run doesn't even generate the LDA or FMLLR files. I've confirmed with feat-to-dim that the two files differ in dimension 39 vs. 40 as described in the error message. Note that the two final.mdl files are identical according to diff. Tracking down why the aligner is failing to extract the correct features is now the challenge. At the moment it seems quite a bit easier to just train my own Kaldi models and use them for alignment.

craigbaker commented 5 years ago

Comparing the unzipped pretrained model files with my own trained model file, I notice in meta.yaml that all of the pretrained models have features: mfcc+deltas, whereas my model has features: {deltas: false, fmllr: true, frame_shift: 10, ivectors: false, lda: true, pitch: false, splice_left_context: null, splice_right_context: null, type: mfcc, use_energy: false}. Reading the source code, I'm pretty sure that the only code that reads meta.yaml (in aligner/models.py) can have no effect on the feature extraction. I tried passing meta.yaml as the --config_path, but it seems to be looking for a very different structure. I tried passing the default config .yaml from the docs but it doesn't like the structure; I tried modifying aligner/command_line/align.py to load the --config_path with train_yaml_to_config() instead of align_yaml_to_config(), but that didn't work either.

For now it looks like the only option would be to train a new model without SAT or LDA, so that it matches up with the feature extraction in the supplied pretrained models. I'm trying that approach now with my data and will check back with the result.

craigbaker commented 5 years ago

I've confirmed that the saved model trained with no SAT or LDA works for alignment. I removed the - lda: and two - sat: sections from the documentation's default training .yaml .

rosa5500 commented 5 years ago

I've confirmed that the saved model trained with no SAT or LDA works for alignment. I removed the - lda: and two - sat: sections from the documentation's default training .yaml .

Thank you very much for your answer! I had the same problem before. I just started learning kaldi. I'd like to ask a related question. So far, can MFA only use gmm-hmm acoustic model? Thank you.

horsti371 commented 5 years ago

I set up the project and did some debugging. I am not sure with the SAT (or the fmllr), but the issue with LDA seems to be the one from the Kaldi page posted by @JoFrhwld . Without the .mat file, the aligner doesn't processes the "transform-feats" step from Kaldi that adds the missing dimension (inside the "apply_lda_func" routine in "processing.py"). I added a few lines in the code, but the following steps should work without any changes in code:

What I am not sure about is the SAT / fmllr. In the "calc_fmllr_func" routine from "multipocessing.py", a "transform-feats" step is done during SAT. Do I need to do this transformation (or some of the other steps in the routine) for aligning as well? I run the aligner using the features after applying the LDA but with a model using SAT for training and it seemed to work somehow...

miracle2k commented 4 years ago

Anyone know what we are losing by disabling the sat section, in terms of accuracy?

miracle2k commented 4 years ago

I had a lot of trouble following here (total newbie) so I am going to expand on the notes generously written above.

The following changes seemed reasonable to me and worked:

https://github.com/miracle2k/Montreal-Forced-Aligner/commit/18876e2

craigbaker commented 4 years ago

Great to see you've developed a fix.

Anyone know what we are losing by disabling the sat section, in terms of accuracy?

Looking at the RESULTS files for a few Kaldi recipes, probably not too much: the CommonVoice English recipe actually gets a slightly worse word error rate when just adding SAT, although it does better with the larger data set. Of course SAT won't get you anything if you're training on data from just one speaker, and any benefit probably depends strongly on your data size and number of speakers. Maybe the other Kaldi RESULTS files can help you estimate.

It's hard to guess how alignment accuracy will relate to WER. When it comes to really pushing the envelope on alignment accuracy, my understanding is that more specialized approaches are needed.

miracle2k commented 4 years ago

@craigbaker Thanks, that was helpful.