Closed jeffmielke closed 1 year ago
Just to confirm this, I am getting a the same error for my data set (Mac OS 13.4, conda install and very recently updated to latest version). The orthography contains a lot of final glottal stops written as <'>, and a lot of words differing from those only in the absence of the glottal stop. The one output TextGrid which is produced happens to contain only one word with a final glottal stop (ngo') with no corresponding glottal-free word anywhere in the transcript. All other files seem to encounter the issue described here. Unfortunately, I cannot remove the lexical items with final <'> since this is a phoneme; I will probably have to edit all my transcripts to change the phone set entirely.
Worth noting that in the out output file I got, any word following the apostrophe-containing word is not treated as a separate word: transcribed <ngo' abɨ> comes out as the "word" <ngo'abɨ>, for example.
Ah sorry, haven't had a chance to look into this, but you should be able to specify --no_textgrid_cleanup
to disable the behavior, or also specify a config file with
clitic_markers:
to prevent them being analyzed as clitics.
Thanks, Michael. I have a vague memory of --no_textgrid_cleanup. Sorry if this is my second time asking this question. It does seem like the default behavior may be problematic and unexpected for a lot of users. Thanks, Matt, for figuring out about the words being combined. I have noticed words like that in the textgrids but I hadn't figured out what was causing them.
On Wed, Jun 7, 2023 at 11:52 PM Michael McAuliffe @.***> wrote:
Ah sorry, haven't had a chance to look into this, but you should be able to specify --no_textgrid_cleanup to disable the behavior, or also specify a config file with
clitic_markers:
to prevent them being analyzed as clitics.
— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/629#issuecomment-1581849644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BKM6HT7T6SNMXSOSLTXKFEADANCNFSM6AAAAAAX6XBV5Y . You are receiving this because you authored the thread.Message ID: @.*** com>
Following up to confirm that --no_textgrid_cleanup
does have the desired effect - all words separated and all files output.
Debugging checklist
[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the
--clean
flag?Describe the issue When a transcript includes a word with an apostrophe such as TRAVELIN', validation and alignment seem to go just fine, but there is an error exporting to the textgrid:
WARNING There were 1 errors encountered in generating TextGrids. Check raleigh_23_05_ok_output/output_errors.txt
for more details
output_errors.txt says this:
The following exceptions were encountered during the output of the alignments to TextGrids:
AlignmentExportError:
Error was encountered in exporting raleigh_23_05_ok_output/ral2060d.TextGrid:
Traceback (most recent call last):
File "/home/jimielke/.conda/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/alignment/multiprocessing.py", line 2498, in run export_textgrid(
File "/home/jimielke/.conda/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/textgrid.py", line 377, in export_textgrid tier.insertEntry(a.to_tg_interval(duration))
File "/home/jimielke/.conda/envs/aligner/lib/python3.10/site-packages/montreal_forced_aligner/data.py", line 1757, in to_tg_interval assert begin < end
AssertionError
The error occurs when if there is a similar dictionary entry without the apostrophe (such as TRAVELIN), even if TRAVELIN' (with the apostrophe) is in the dictionary, The problem goes away when I remove the apostrophe-less dictionary entry. It was easy to fix once I figured out what the problem was, but the error message didn't provide a lot of clues that helped me find the problem.
For Reproducing your issue Please fill out the following:
Log file The log file is from a run with a subset of the files (the ones with problems).
Desktop (please complete the following information):
Additional context Add any other context about the problem here. pg_log_global.txt