Open bwang482 opened 5 years ago
Hey, I was just asking myself the same question, I figured out that if an error occurs (bad audio or bad text...), Textgrid aren't created, otherwise you would find them under the output directory:
light_out $ ls -R
.:
1089 oovs_found.txt utterance_oovs.txt
./1089:
1089-134686-0000.TextGrid 1089-134686-0010.TextGrid 1089-134686-0020.TextGrid 1089-134686-0030.TextGrid
You are right errors should be raised. But you can find logs under the ~/Documents/MFA/XXXXX/tri_ali/log/align.0.0.log There are several logs file.
Thx for this project!!
Thanks @nongiach ! Below is the align.0.0.log
file, showing the decoding error:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/bowang/Documents/MFA/data/tri_ali/0.mdl" - |' ark:/Users/bowang/Documents/MFA/data/tri_ali/fsts.0 ark:/Users/bowang/Documents/MFA/data/train/split1/cmvndeltafeats_fmllr.0 ark:-
gmm-boost-silence --boost=1.0 6 /Users/bowang/Documents/MFA/data/tri_ali/0.mdl -
WARNING (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:103) Wrote model to -
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance test with beam 40
WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file test, len = 147655
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:129) Overall log-likelihood per frame is nan over 0 frames.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:131) Retried 1 out of 1 utterances.
LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:133) Done 0, errors on 1
Do you think it's caused by too many OOV words?
Actually I don't have that many OOV words, does the model not tolerate any OOV word at all??
Hey, I want you to know that I'm also using https://www.readbeyond.it/aeneas/ which really easy and efficient. Montreal Forced Aligner is also very good but I will have to fix all those errors, I have about 500 audio files (15 minutes each). But it looks like mfa is more designed for very short audio file? MFA is very good to detect silence. For my use-case I need something that can handle the fact that my audio might contain more sentences than expected.
Thanks @nongiach ! It seems a lot of these HMM/DNN based alignment models have very low tolerance of OOV words. Have you tried Gentle ? Its prebuilt Mac application works fine, but strangely I couldn't get its Python source code to work.
Thanks for the suggestion of Aeneas. I have used it before. It is indeed a lot more friendly and efficient to use (which is a huge up side). However, due to its use of DTW, it cannot detect pauses or silences, and the resulting alignments are not very good. I am a bit frustrated I have to say, that there has to be a tradeoff between accuracy and easy-to-use.
Have you tried any other tools/models @orianakc ?
Hey @bluemonk482 I didn't try gentle because it supports only english If I remember. But I just saw that pocketsphinx supports alignment with silence detection: https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation/ I will try it out. Ping me on twitter @chaignc. We don't want to flood here :)
Thanks @nongiach ! Below is the
align.0.0.log
file, showing the decoding error:gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/bowang/Documents/MFA/data/tri_ali/0.mdl" - |' ark:/Users/bowang/Documents/MFA/data/tri_ali/fsts.0 ark:/Users/bowang/Documents/MFA/data/train/split1/cmvndeltafeats_fmllr.0 ark:- gmm-boost-silence --boost=1.0 6 /Users/bowang/Documents/MFA/data/tri_ali/0.mdl - WARNING (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.1.81~1-1cd6d]:main():gmm-boost-silence.cc:103) Wrote model to - WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance test with beam 40 WARNING (gmm-align-compiled[5.1.81~1-1cd6d]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file test, len = 147655 LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:129) Overall log-likelihood per frame is nan over 0 frames. LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:131) Retried 1 out of 1 utterances. LOG (gmm-align-compiled[5.1.81~1-1cd6d]:main():gmm-align-compiled.cc:133) Done 0, errors on 1
Do you think it's caused by too many OOV words?
I am getting the same error in the log. Any idea how to solve it?
Using Windows 10 I had the same issue, No TextGrids output files instead I find an empty file oovs_found.txt file.
This was the result of failed alignment, errors logged in gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl" - |' 'ark:C:\Users\Brandon/Documents/MFA\data\tri_ali\fsts.0' 'ark:C:\Users\Brandon/Documents/MFA\data\train\split1\cmvndeltafeats_fmllr.0' ark:- gmm-boost-silence --boost=1.0 6 'C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl' - WARNING (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:103) Wrote model to - LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:137) Retried 0 out of 0 utterances. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 0
No errors and no done
Any Idea how to go about this
Hi @Davidelvis , which version are you using? I have had better success with 1.00 and 1.01 on macOS. Another thing to watch out for if you are using TextGrids is to make sure they're saved in the simple TextGrid format (i.e., not "chronological TextGrid"). You can convert easily just by resaving with "Save as text file..." in Praat. Dr. Eleanor Chodroff has a useful tutorial that details how you need to prep your data before running MFA: http://eleanorchodroff.com/tutorial/montreal-forced-aligner.html
Also, once you cleaned up your data or made other changes remember to clear the cache folder in ~/MFA/ before running again.
FYI, if people are facing the same issue its due to caching issue by MFA. Either pass --clean to the alignment or run once:
mfa configure --always_clean --always_verbose
I have used the command below with existing english dict and pretrained model:
bin/mfa_align /Users/Tools/montreal-forced-aligner/data /Users/Tools/montreal-forced-aligner/english.dict english /Users/Tools/montreal-forced-aligner/output
In the
data
folder, there istest.wav
andtest.lab
which contains the transcript (as raw text).However, I only get
oovs_found.txt
andutterance_oovs.txt
, and no alignments as output. Am I missing something obvious here?