MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

[BUG] --no_textgrid_cleanup has no apparent effect #781

Closed jeffmielke closed 3 months ago

jeffmielke commented 3 months ago

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue Words starting or ending with apostrophes get merged with adjacent words in the output in version 3.0.2, even with --no_cleanup_textgrids or --no_textgrid_cleanup or specifying a config file with clitic_markers: in it.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 6/7
    • Are you using lab files or TextGrid files for input? TextGrid
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? my own based on CMU dictionary
    • If it's a custom dictionary, what is the phoneset? arpabet
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_us_arpa
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). no errors

Desktop (please complete the following information):

Additional context Add any other context about the problem here.