MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

How to obtain alignments? #84

Open bwang482 opened 5 years ago

bwang482 commented 5 years ago

I have used the command below with existing english dict and pretrained model:

bin/mfa_align /Users/Tools/montreal-forced-aligner/data /Users/Tools/montreal-forced-aligner/english.dict english /Users/Tools/montreal-forced-aligner/output

In the data folder, there is test.wav and test.lab which contains the transcript (as raw text).

However, I only get oovs_found.txt and utterance_oovs.txt, and no alignments as output. Am I missing something obvious here?

nongiach commented 5 years ago

Hey, I was just asking myself the same question, I figured out that if an error occurs (bad audio or bad text...), Textgrid aren't created, otherwise you would find them under the output directory:

light_out $ ls -R
.:
1089  oovs_found.txt  utterance_oovs.txt

./1089:
1089-134686-0000.TextGrid  1089-134686-0010.TextGrid  1089-134686-0020.TextGrid  1089-134686-0030.TextGrid

You are right errors should be raised. But you can find logs under the ~/Documents/MFA/XXXXX/tri_ali/log/align.0.0.log There are several logs file.

Thx for this project!!

bwang482 commented 5 years ago

Thanks @nongiach ! Below is the align.0.0.log file, showing the decoding error:

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/bowang/Documents/MFA/data/tri_ali/0.mdl" - |' ark:/Users/bowang/Documents/MFA/data/tri_ali/fsts.0 ark:/Users/bowang/Documents/MFA/data/train/split1/cmvndeltafeats_fmllr.0 ark:- 
gmm-boost-silence --boost=1.0 6 /Users/bowang/Documents/MFA/data/tri_ali/0.mdl - 
WARNING (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:103) Wrote model to -
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance test with beam 40
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file test, len = 147655
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:129) Overall log-likelihood per frame is nan over 0 frames.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:131) Retried 1 out of 1 utterances.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:133) Done 0, errors on 1

Do you think it's caused by too many OOV words?

bwang482 commented 5 years ago

Actually I don't have that many OOV words, does the model not tolerate any OOV word at all??

nongiach commented 5 years ago

Hey, I want you to know that I'm also using https://www.readbeyond.it/aeneas/ which really easy and efficient. Montreal Forced Aligner is also very good but I will have to fix all those errors, I have about 500 audio files (15 minutes each). But it looks like mfa is more designed for very short audio file? MFA is very good to detect silence. For my use-case I need something that can handle the fact that my audio might contain more sentences than expected.

bwang482 commented 5 years ago

Thanks @nongiach ! It seems a lot of these HMM/DNN based alignment models have very low tolerance of OOV words. Have you tried Gentle ? Its prebuilt Mac application works fine, but strangely I couldn't get its Python source code to work.

Thanks for the suggestion of Aeneas. I have used it before. It is indeed a lot more friendly and efficient to use (which is a huge up side). However, due to its use of DTW, it cannot detect pauses or silences, and the resulting alignments are not very good. I am a bit frustrated I have to say, that there has to be a tradeoff between accuracy and easy-to-use.

Have you tried any other tools/models @orianakc ?

nongiach commented 5 years ago

Hey @bluemonk482 I didn't try gentle because it supports only english If I remember. But I just saw that pocketsphinx supports alignment with silence detection: https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation/ I will try it out. Ping me on twitter @chaignc. We don't want to flood here :)

lcukerd commented 5 years ago

Thanks @nongiach ! Below is the align.0.0.log file, showing the decoding error:

gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/bowang/Documents/MFA/data/tri_ali/0.mdl" - |' ark:/Users/bowang/Documents/MFA/data/tri_ali/fsts.0 ark:/Users/bowang/Documents/MFA/data/train/split1/cmvndeltafeats_fmllr.0 ark:- 
gmm-boost-silence --boost=1.0 6 /Users/bowang/Documents/MFA/data/tri_ali/0.mdl - 
WARNING (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:103) Wrote model to -
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance test with beam 40
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file test, len = 147655
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:129) Overall log-likelihood per frame is nan over 0 frames.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:131) Retried 1 out of 1 utterances.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:133) Done 0, errors on 1

Do you think it's caused by too many OOV words?

I am getting the same error in the log. Any idea how to solve it?

Davidelvis commented 3 years ago

Using Windows 10 I had the same issue, No TextGrids output files instead I find an empty file oovs_found.txt file.

This was the result of failed alignment, errors logged in gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl" - |' 'ark:C:\Users\Brandon/Documents/MFA\data\tri_ali\fsts.0' 'ark:C:\Users\Brandon/Documents/MFA\data\train\split1\cmvndeltafeats_fmllr.0' ark:- gmm-boost-silence --boost=1.0 6 'C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl' - WARNING (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:103) Wrote model to - LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:137) Retried 0 out of 0 utterances. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 0

No errors and no done

Any Idea how to go about this

orianakc commented 3 years ago

Hi @Davidelvis , which version are you using? I have had better success with 1.00 and 1.01 on macOS. Another thing to watch out for if you are using TextGrids is to make sure they're saved in the simple TextGrid format (i.e., not "chronological TextGrid"). You can convert easily just by resaving with "Save as text file..." in Praat. Dr. Eleanor Chodroff has a useful tutorial that details how you need to prep your data before running MFA: http://eleanorchodroff.com/tutorial/montreal-forced-aligner.html

Also, once you cleaned up your data or made other changes remember to clear the cache folder in ~/MFA/ before running again.

ganeshkrishnan1 commented 3 months ago

FYI, if people are facing the same issue its due to caching issue by MFA. Either pass --clean to the alignment or run once:

mfa configure --always_clean --always_verbose