Open oleksii-a opened 5 years ago
Gentle uses an nnet3 acoustic model which is probably more robust (they also do a second pass on the audio file). MFA uses a GMM model. My guess is that you may need a very large beam size for the latter to succeed on a long file...
For 1.0.1 version, you can try setting the -b
flag to something large (i.e., -b 1000
or -b 10000
), that will increase the beam size for the GMM models. I haven't done a ton of testing for very long files, so I'm not fully sure yet what the best beam size is.
Hello,
I am trying to align relatively long file (about 1 hour) with the transcription (in .txt format). After 10 minutes of processing no TextGrid alignment is generated, but there are out of vocabulary files. I am using pre-trained acoustic model for English and pre-trained pronunciation dictionary (downloaded from the web-site). Using the latest 1.0.1 version. It should be mentioned that I haven't had such a problem when aligning small files (3-5 mins long).
Also, tried Gentle to align the same audio and got good result (although there are around 5-10% of unaligned text, but it is handled properly and can be filtered out afterwards).
Is there any workaround or fix for this issue?