MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

[BUG] tiers without words cause errors #831

Open jeffmielke opened 3 months ago

jeffmielke commented 3 months ago

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue When there happens to be a tier in an input textgrid with no words, alignment does not complete. This situation comes up when there is a speaker whose transcript tier has only a few intervals and all of them are blank or labeled with {spn} or something that is not in the dictionary. This results in errors always including keyerror for the speaker with the small tier in the data dictionary and then local variable 'output_path' referenced before assignment, but the alignment process appears to continue without making progress. I searched through the textgrids to inspect the short tiers and changed them, for example, if the only non-empty tier is labeled "{spn}" I can change that to "ok {spk}" and then it will align successfully. But I think this situation used to not cause errors in the first place.

WARNING Alignment analysis not available without using postgresql INFO Exporting alignment TextGrids to hickory_ads_output... Process ExportTextGridProcessWorker-118: Traceback (most recent call last): File "/home/jimielke/.conda/envs/aligner/lib/python3.9/site-packages/montreal _forced_aligner/alignment/multiprocessing.py", line 1627, in run for output_path in construct_textgrid_output( File "/home/jimielke/.conda/envs/aligner/lib/python3.9/site-packages/montreal _forced_aligner/textgrid.py", line 393, in construct_textgrid_output process_word_data() File "/home/jimielke/.conda/envs/aligner/lib/python3.9/site-packages/montreal _forced_aligner/textgrid.py", line 353, in process_word_data and data[speaker_name]["words"] KeyError: 'MJF'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jimielke/.conda/envs/aligner/lib/python3.9/multiprocessing/proces s.py", line 315, in _bootstrap self.run() File "/home/jimielke/.conda/envs/aligner/lib/python3.9/site-packages/montreal _forced_aligner/alignment/multiprocessing.py", line 1643, in run output_path, UnboundLocalError: local variable 'output_path' referenced before assignment

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 1/3 (after I isolated the issue)
    • Are you using lab files or TextGrid files for input? TextGrid
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? no, using one from the SPADE project
    • If it's a custom dictionary, what is the phoneset? Arpabet
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_us_arpa
    • If it's a model you've trained, what data was it trained on? n/a

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

I don't see error messages in any of the log files.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

vivian556123 commented 1 month ago

same problem