MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.29k stars 242 forks source link

KaldiProcessingError: You provided the "cs" option but are not calling with keys in sorted order (v2.0.0a4) #224

Closed dlukes closed 3 years ago

dlukes commented 3 years ago

Thank you very much for #201, I can now get past the LDA stage! However, I'm getting stuck at the very next one instead (SAT1):

``` All required kaldi binaries were found! /home/lukes/Documents/MFA/ortofon-v2-mfa-in/train_and_align.log INFO - Cleaning old directory! INFO - Setting up corpus information... INFO - Number of speakers in corpus: 1431, average number of utterances per speaker: 178.93570929419985 INFO - Number of speakers in corpus: 1431, average number of utterances per speaker: 178.93570929419985 INFO - Parsing dictionary without pronunciation probabilties without silence probabilties INFO - Creating dictionary information... INFO - Setting up training data... Generating base features (mfcc)... Calculating CMVN... INFO - Initializing training for mono... /home/lukes/Documents/MFA/thirdparty/bin/feat-to-dim 'ark,s,cs:apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/utt2spk.0 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/cmvn.0.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mf a-in/corpus_data/split36/feats.0.scp ark:- | add-deltas ark:- ark:- |' - apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/utt2spk.0 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/cmvn.0.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/feats.0.scp ark:- add-deltas ark:- ark:- WARNING (feat-to-dim[5.5]:Close():util/kaldi-io.cc:515) Pipe apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/utt2spk.0 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/cmvn.0.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/split36/feats.0.scp ark:- | add-deltas ark:- ark:- | had nonzero return status 36096 INFO - Initialization complete! 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [03:32<00:00, 5.45s/it] INFO - Training complete! INFO - Generating alignments using mono models using 5000 utterances... INFO - Initializing training for tri... INFO - Initialization complete! 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [02:37<00:00, 4.64s/it] INFO - Training complete! INFO - Generating alignments using tri models using 10000 utterances... INFO - Initializing training for lda... INFO - Initialization complete! 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [20:55<00:00, 35.87s/it] INFO - Training complete! INFO - Generating alignments using lda models using 10000 utterances... INFO - Initializing training for sat1... INFO - Initializing speaker-adapted triphone training... Traceback (most recent call last): File "/home/lukes/miniconda3/envs/mfa/bin/mfa", line 8, in sys.exit(main()) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/command_line/mfa.py", line 337, in main run_train_corpus(args) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/command_line/train_and_align.py", line 143, in run_train_corpus align_corpus(args, unknown_args) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/command_line/train_and_align.py", line 95, in align_corpus a.train() File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/aligner/trainable.py", line 78, in train trainer.init_training(identifier, self.temp_directory, self.corpus, self.dictionary, previous) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/trainers/sat.py", line 278, in init_training calc_fmllr(self.train_directory, self.data_directory, File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/multiprocessing/alignment.py", line 829, in calc_fmllr run_non_mp(calc_fmllr_func, jobs, config.log_directory) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/multiprocessing/helper.py", line 66, in run_non_mp parse_logs(log_directory) File "/home/lukes/miniconda3/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/helper.py", line 202, in parse_logs raise KaldiProcessingError(error_logs) montreal_forced_aligner.exceptions.KaldiProcessingError: There was one or more errors when running Kaldi binaries. ```

Digging into the corresponding logs, the problem seems to be fairly trivial -- just something to do with sorting the input files? Cf. the title of the issue, and here's a sample fmllr.1.*.log:

``` /home/lukes/Documents/MFA/thirdparty/bin/ali-to-post ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/ali.1 ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/post.1 LOG (ali-to-post[5.5]:main():bin/ali-to-post.cc:73) Converted 213 alignments. /home/lukes/Documents/MFA/thirdparty/bin/weight-silence-post 0.0 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/1.mdl ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/post.1 ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/weight.1 LOG (weight-silence-post[5.5]:main():bin/weight-silence-post.cc:95) Done 213 posteriors. /home/lukes/Documents/MFA/thirdparty/bin/gmm-est-fmllr --verbose=4 --fmllr-update-type=full --spk2utt=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/spk2utt.1 /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/1.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/utt2spk.1 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/cmvn.1.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/feats.1.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/lda.mat ark:- ark:- |' ark,s,cs:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/weight.1 ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/trans.1 splice-feats --left-context=3 --right-context=3 ark:- ark:- transform-feats /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/lda.mat ark:- ark:- apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/utt2spk.1 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/cmvn.1.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/feats.1.scp ark:- LOG (gmm-est-fmllr[5.5]:ComputeFmllrMatrixDiagGmmFull():transform/fmllr-diag-gmm.cc:262) fMLLR objf improvement is 4.4858 per frame over 924 frames. LOG (gmm-est-fmllr[5.5]:main():gmmbin/gmm-est-fmllr.cc:143) For speaker 12A001N-40, auxf-impr from fMLLR is 4.4858, over 924 frames. ERROR (gmm-est-fmllr[5.5]:FindKeyInternal():util/kaldi-table-inl.h:2106) You provided the "cs" option but are not calling with keys in sorted order: 14ee_12A023N-78_Uo < a0_12A001N-40_uo: rspecifier is ark,s,cs:apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/utt2spk.1 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/cmvn.1.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/feats.1.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/lda.mat ark:- ark:- | WARNING (gmm-est-fmllr[5.5]:Close():util/kaldi-io.cc:515) Pipe apply-cmvn --utt2spk=ark:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/utt2spk.1 scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/cmvn.1.scp scp:/home/lukes/Documents/MFA/ortofon-v2-mfa-in/corpus_data/subset_10000/feats.1.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats /home/lukes/Documents/MFA/ortofon-v2-mfa-in/sat1/lda.mat ark:- ark:- | had nonzero return status 36096 kaldi::KaldiFatalError ```
mmcauliffe commented 3 years ago

Oh interesting, I'll see if I can reproduce it, I agree it should be pretty trivial to fix. Just to get some more information what sort of directory structure do you have set up for it, and are the files prefixed with the directory? Are 14ee and a0 speakers that have their own directory, or are the file names more random?

dlukes commented 3 years ago

Just one big flat directory, which I thought was required, but now I see the docs clearly state the default expected structure is with speaker-subdirectories... I don't know where I got the idea that it had to be flat, the nested structure is of course much more convenient :)

(Maybe MFA 1.x wanted it? Probably not.)

At any rate, I'm using -s prosodylab, so speaker IDs are the second _-delimited field, i.e. 12A023N-78 and 12A001N-40; 14ee and a0 are utterance IDs.

So maybe the problem simply is that a flat directory layout + -s prosodylab isn't supported?

mmcauliffe commented 3 years ago

Ok, thanks, that's helpful! It's mostly that I updated the feature parsing to be easier on disk space, and just happened that the data I had been testing it with was properly sorted. I'll test out the prosodylab style IDs a bit more, I usually don't use them since speaker directories are easier for me to comprehend for my own data, but in general, MFA walks through the full corpus directory, so any nested data should get found as long as it has wav/lab pairs or wav/textgrid pairs.

dlukes commented 3 years ago

speaker directories are easier for me to comprehend for my own data

I prefer them too, I just didn't realize I could use them :) I'll switch over to the nested layout first thing tomorrow.

dlukes commented 3 years ago

And thank you very much for the quick replies and all the help!

mmcauliffe commented 3 years ago

Added a fix to make sure that utterance with prosodylab parsing have speaker prefixed to the utterances, which should keep Kaldi happy, and then it'll get stripped out when outputting textgrids.

dlukes commented 3 years ago

Just FYI: I restructured my data to take advantage of speaker subdirs, so that instead of e.g.14ee_12A023N-78_Uo, I now have 12A023N-78/14ee-Uo, and I'm not passing -s prosodylab anymore. But I'm still hitting the same error: You provided the "cs" option but are not calling with keys in sorted order: 14ee-Uo < a0-uo.

So it looks like there might be another underlying issue here? I'm not quite sure how your setup differs from mine, but maybe the filenames need to have the same number of characters? (Which is a bit of a shot in the dark, but it's the type of thing that does affect lexicographical sorting -- 14ee < a0, but 00a0 < 14ee.)

mmcauliffe commented 3 years ago

Hmm, ok, it's possible that the difference between your set up and the one I've had working over the weekend was that all filenames started with the speaker ID, so they were all sorted by speaker, I'll take a look more at it tonight, but you can try out the git repo version (cloning and installing with python setup.py install) with the -s prosodylab flag and that should do the prefixing internally to get it to work.

dlukes commented 3 years ago

Padding the filenames to the same width with leading 0's seems to have done the trick, SAT1 is now running :) I'll try the git install + flat layout + -s prosodylab later if I have a bit of time to spare.

mmcauliffe commented 3 years ago

Sounds like you have a solution, but MFA should now prefix all utterances with speakers if they aren't already, and that should make Kaldi happy with sorting.

dlukes commented 3 years ago

I re-installed with git and re-ran the aligner 1) with flat dir layout + -s prosodylab, and 2) with speaker subdirs, both without zero-padding the utterance IDs, and got past the SAT1 training in both cases, so I can confirm your fix works! :) :tada:

In the next episode of this never-ending story, now I'm getting stuck at the next stage, generating alignments using SAT1 models... Basically, three utterances are failing with ERROR (apply-cmvn[5.5]:ApplyCmvn():transform/cmvn.cc:70) Dim mismatch: cmvn 2x14, feats 0x0.

I'll open a new issue, but I'm guessing the advice at this point might be that this means these utterances are probably weird, so just leave them out. In which case, the solution would be to add a helpful error message to MFA suggesting to do so.