MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.3k stars 243 forks source link

No TextGrid files in output folder, no error message #130

Open sj-perry opened 4 years ago

sj-perry commented 4 years ago

I've encountered a problem running the MFA where no error is thrown and no TextGrids are written to the Output folder.

I have 9 speakers, with ~20 minutes of speech for each speaker. I have TextGrids with one tier, with relatively short utterances orthographically transcribed in TextGrid format. I have run the mfa_align command using the pretrained english model and the librispeech dictionary. The aligner seems to run fine, and no error is produced, but there are no TextGrid files in the specified output folder.

I have this error with versions 1.1.0 and 1.0.1, and on both Linux and Windows.

Anybody have an idea of what is going on?

manazhao commented 4 years ago

got a similar issue when trying the example. No TextGrid file was produced in the output directory. I checked the log file (default path is ~/Documents/MFA//logging/corpus.log) and saw something like The following utterances were ignored due to lack of features. It seems to me the binary has trouble getting mfcc features from the audio files.

ABC0408 commented 4 years ago

I also got no output in ../Montreal-Forced-Aligner/examples/alignment

bin/mfa_align ../Montreal-Forced-Aligner/examples/ch data-mandarin/chinese.dict.txt pretrained_models/mandarin.zip ../Montreal-Forced-Aligner/examples/alignment Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 5.0 Creating dictionary information... Setting up corpus_data directory... Generating base features (mfcc)... Calculating CMVN... Done with setup. Done! Everything took 1.4139506816864014 seconds

xf15 commented 4 years ago

using 1.0.1 on mac.

I got output (and they are accurate) when I try my own english example, my own Spanish example, and the mandarin example they provided at https://montreal-forced-aligner.readthedocs.io/en/latest/example.html, but no output when I try my own mandarin wav. got

mandarin_wav sample_mandarin_dict.txt pretrained_models/mandarin.zip output
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 2.0
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 2.0
Done with setup.
100%|█████████████████████████████████████████████| 2/2 [00:01<00:00,  1.07it/s]
Done! Everything took 4.492555856704712 seconds

People has pointed out that this was the result of failed alignment, errors logged in ~/Documents/MFA/XXXXX/tri_ali/log/align.0.0.log (https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/84) indeed, compare the log for my succeded Spanish

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/0.mdl" - |' ark:/Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/fsts.0 ark:/Users/xzfang/Documents/MFA/sample_spanish_wav/train/split1/cmvndeltafeats_fmllr.0 ark:- 
gmm-boost-silence --boost=1.0 6 /Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/0.mdl - 
WARNING (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:103) Wrote model to -
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) Savannah_beso_a_Emilia
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) Savannah_pateo_a_Emilia
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:135) Overall log-likelihood per frame is -109.2 over 551 frames.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:137) Retried 0 out of 2 utterances.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:139) Done 2, errors on 0

and the log for my failed mandarin

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/0.mdl" - |' ark:/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/fsts.0 ark:/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/train/split1/cmvndeltafeats_fmllr.0 ark:- 
gmm-boost-silence --boost=1.0 6 /Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/0.mdl - 
WARNING (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:103) Wrote model to -
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) 1
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance 1 with beam 40
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file 1, len = 666
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) 2
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance 2 with beam 40
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file 2, len = 666
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:135) Overall log-likelihood per frame is nan over 0 frames.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:137) Retried 2 out of 2 utterances.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:139) Done 0, errors on 2

i hope this is only a problem with mandarin -- I am using p2fa (https://web.sas.upenn.edu/phonetics-lab/facilities/) for both english and mandarine fine.

btw, stereo is not a problem here(https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/107), 1.0.1 can handle stereo, my english wav was stereo.

simlmx commented 3 years ago

I had the same issue (no TextGrids output files).

Making sure all the words are in the dictionary fixed it for me (i.e. no prompt to fix words not in the dictionary and an empty oovs_found.txt file).

Davidelvis commented 3 years ago

Using Windows 10 I had the same issue, No TextGrids output files instead I find an empty file oovs_found.txt file.

This was the result of failed alignment, errors logged in gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl" - |' 'ark:C:\Users\Brandon/Documents/MFA\data\tri_ali\fsts.0' 'ark:C:\Users\Brandon/Documents/MFA\data\train\split1\cmvndeltafeats_fmllr.0' ark:- gmm-boost-silence --boost=1.0 6 'C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl' - WARNING (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:103) Wrote model to - LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:137) Retried 0 out of 0 utterances. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 0

No errors and no done

ambiSk commented 2 years ago

Try increasing beam value.

By default it is 10. I had an audio of 30 sec, for that I used beam=100. If you are using CLI , then add argument mfa align ... --beam 100. Apart from that I found that TextGrid are also saved into the temporary directory, like if you are using argument -t or --temp then you will find your textgrids in <folder_name>_pretrained_aligner/pretrained_aligner/textgrids.

zhaolibo1989 commented 1 year ago

Another project relies on this tool. I also encountered a similar problem when using that, and haven't found a solution yet. Who can help me?

wesley-js-leong commented 1 year ago

Just encountered this problem: log was not showing any problems ("Done XX, errors on 0") but no TextGrid files were appearing in output folder. Increasing beam didn't work.

Eventually fixed the issue by adding the --clean flag when running align. Might be good to point out in the intro that default behavior on validate and align is to not overwrite previous runs!

halannhile commented 10 months ago

adding --clean and --overwrite worked for me! https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/configuration/index.html

mfa align --clean --overwrite ...

got this idea from the tutorial: https://www.youtube.com/watch?v=phVZijLo9ro