Open JoFrhwld opened 5 years ago
I'm hitting the same problem. I'm using the default configuration, rather than providing a .yaml at training time. I often use Kaldi for other tasks, but all I have to offer so far is the observation that the final alignment during training reads a different feature filename than a subsequent alignment from a saved model. The training log file sat2_ali/log/align.final.0.log
has the command:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/sdd/tmp/train/sat2/final.mdl" - |' ark:/sdd/tmp/train/sat2/fsts.0 scp:/sdd/tmp/train/corpus_data/split2/features_mfcc_cmvn_lda_fmllr.0.scp ark:/sdd/tmp/train/sat2_ali/ali.0 ark,t:/sdd/tmp/train/sat2_ali/ali.0.scores
while the alignment log log/align.final.0.log
has:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/sdd/tmp/dev/align/final.mdl" - |' ark:/sdd/tmp/dev/align/fsts.0 scp:/sdd/tmp/dev/corpus_data/split2/features_mfcc_cmvn_deltas.0.scp ark:/sdd/tmp/dev/align/ali.0 ark,t:/sdd/tmp/dev/align/ali.0.scores
Note the different input filenames features_mfcc_cmvn_lda_fmllr
vs. features_mfcc_cmvn_deltas
. The alignment run doesn't even generate the LDA or FMLLR files. I've confirmed with feat-to-dim
that the two files differ in dimension 39 vs. 40 as described in the error message. Note that the two final.mdl
files are identical according to diff. Tracking down why the aligner is failing to extract the correct features is now the challenge. At the moment it seems quite a bit easier to just train my own Kaldi models and use them for alignment.
Comparing the unzipped pretrained model files with my own trained model file, I notice in meta.yaml
that all of the pretrained models have features: mfcc+deltas
, whereas my model has features: {deltas: false, fmllr: true, frame_shift: 10, ivectors: false, lda: true, pitch: false, splice_left_context: null, splice_right_context: null, type: mfcc, use_energy: false}
. Reading the source code, I'm pretty sure that the only code that reads meta.yaml
(in aligner/models.py
) can have no effect on the feature extraction. I tried passing meta.yaml
as the --config_path
, but it seems to be looking for a very different structure. I tried passing the default config .yaml from the docs but it doesn't like the structure; I tried modifying aligner/command_line/align.py
to load the --config_path
with train_yaml_to_config()
instead of align_yaml_to_config()
, but that didn't work either.
For now it looks like the only option would be to train a new model without SAT or LDA, so that it matches up with the feature extraction in the supplied pretrained models. I'm trying that approach now with my data and will check back with the result.
I've confirmed that the saved model trained with no SAT or LDA works for alignment. I removed the - lda:
and two - sat:
sections from the documentation's default training .yaml .
I've confirmed that the saved model trained with no SAT or LDA works for alignment. I removed the
- lda:
and two- sat:
sections from the documentation's default training .yaml .
Thank you very much for your answer! I had the same problem before. I just started learning kaldi. I'd like to ask a related question. So far, can MFA only use gmm-hmm acoustic model? Thank you.
I set up the project and did some debugging. I am not sure with the SAT (or the fmllr), but the issue with LDA seems to be the one from the Kaldi page posted by @JoFrhwld . Without the .mat file, the aligner doesn't processes the "transform-feats" step from Kaldi that adds the missing dimension (inside the "apply_lda_func" routine in "processing.py"). I added a few lines in the code, but the following steps should work without any changes in code:
- sat:
sections from the basic_train.yamlfeatures:
-section of basic_align.yaml. Make sure that this section contains lda: True
What I am not sure about is the SAT / fmllr. In the "calc_fmllr_func" routine from "multipocessing.py", a "transform-feats" step is done during SAT. Do I need to do this transformation (or some of the other steps in the routine) for aligning as well? I run the aligner using the features after applying the LDA but with a model using SAT for training and it seemed to work somehow...
Anyone know what we are losing by disabling the sat
section, in terms of accuracy?
I had a lot of trouble following here (total newbie) so I am going to expand on the notes generously written above.
AccousticModel.export_model()
). Feature extraction runs via FeatureConfig.generate_features()
. As part of that process, multiprocessing.py.apply_lda()
is called. This is either two steps (running the splice-feats
+ transform-feats
processes from Kaldi, or just one (splice-feats
only).lda.mat
is part of the unzipped model folder. If the file does not exist, it skips the the transform-feats
steps. It also skips that step if {"features": {"lda": true}
is not set in the alignment config file (as opposed to the training config file), which it is not in the default basic_align.yaml
.Dim mismatch: data dim = 39 vs. model dim = 40
error is that during model training the transforms-feats
step was done, but now is skipped. lda.mat
is not included in the model.lda.mat
to the generated model archive manually did not work for me, because MFA manually lists the file to export, and does not export lda.mat
from the model.The following changes seemed reasonable to me and worked:
https://github.com/miracle2k/Montreal-Forced-Aligner/commit/18876e2
Great to see you've developed a fix.
Anyone know what we are losing by disabling the
sat
section, in terms of accuracy?
Looking at the RESULTS files for a few Kaldi recipes, probably not too much: the CommonVoice English recipe actually gets a slightly worse word error rate when just adding SAT, although it does better with the larger data set. Of course SAT won't get you anything if you're training on data from just one speaker, and any benefit probably depends strongly on your data size and number of speakers. Maybe the other Kaldi RESULTS files can help you estimate.
It's hard to guess how alignment accuracy will relate to WER. When it comes to really pushing the envelope on alignment accuracy, my understanding is that more specialized approaches are needed.
@craigbaker Thanks, that was helpful.
I've trained a model using
mfa_train_and_align
, and would like to reuse it. When I run the aligner like soit successfully gets through set up, but then gives me the following error:
When I look at the relevant log files, I think the relevant error is this:
I found a related discussion on the Kaldi page here: https://sourceforge.net/p/kaldi/discussion/1355348/thread/028ffd7f/
The answer that seemed to fix things was:
Is there something that I can do either with the model I've trained, or in re-training to fix this problem? Here's my training config file.