MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG] Align fails on long files #400

Open eeishaan opened 2 years ago

eeishaan commented 2 years ago

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue The aligner is not able to process a 23 minute file. The outcome is the same with both mfa validate and mfa align. I suspect that it has to do with the length of the file because I am able to align the LibriSpeech dataset, whose files are quite short.

What's the maximum file length that MFA is able to process?

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 1
    • Are you using lab files or TextGrid files for input? lab files
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? Yes, English
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? Yes, English
    • If it's a model you've trained, what data was it trained on?

Log file

NFO - Compiling training graphs... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.13s/it] DEBUG - Compiling training graphs took 2.127983331680298 DEBUG - Performing first-pass alignment... INFO - Generating alignments... 0%| | 0/1 [00:19<?, ?it/s] DEBUG - Analyzing alignment information DEBUG - No utterances were aligned, this likely indicates serious problems with the aligner. 0 of 1 utterances were aligned DEBUG - Compiling information took 0.08590435981750488 DEBUG - Alignment round took 19.166051149368286 DEBUG - Analyzing alignment information DEBUG - No utterances were aligned, this likely indicates serious problems with the aligner. 0 of 1 utterances were aligned DEBUG - Compiling information took 0.08168458938598633 INFO - Done! Everything took 33.02434706687927 seconds

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

ahmetgunduz commented 2 years ago

@mmcauliffe I have observed the same issue. Any updates on this BUG

mmcauliffe commented 2 years ago

You can try upping the beam: https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/configuration/global.html#global-options to something like 100 and that should help it output something at least.

Alignment isn't well defined for something like 23 minutes (there's just a lot of potential paths through the alignment lattice for something that long, so it's easy to lose the best path with smaller beams), so if you have the ability to split it into smaller chunks, that would be ideal. I'll think about other functionality that would enable breaking up and aligning longer utterances.

ahmetgunduz commented 2 years ago

@mmcauliffe Can you please define the length of smaller chunks? Ideally, at most how long should an audio be?

mcgunay commented 1 month ago

Can anyone help us on this issue? What would be the max length of the audio?