[X] Have you updated to latest MFA version?
[X] Have you tried rerunning the command with the --clean flag?
Describe the issue
I'm not an expert on anything here, but I was trying to parse the TextGrid ouptut with the TextGrid Python library when I ran into an issue. That library uses the intervals: size keyword in the TextGrid format to tell how large an intervals list is and doesn't have contingency case for when that size is wrong. In the MFA output, that size is wrong. This is from 27-123349-0041.TextGrid from the LibreSpeech example:
Near as I can tell, this behavior stems from the interval_index in textgrid.py being able grow past len(tier._entries) when includeBlankSpaces is enabled. Relevant code (comments mine):
fd.write(tab * 2 + f"intervals: size = {len(tier._entries)} \n") # <-- This is where the intervals: size key is set
interval_index = 1
if includeBlankSpaces and tier._entries:
if tier._entries[0][0] > 0.001:
fd.write(
f"{tab * 2}intervals [{interval_index}]:\n"
f"{tab * 3}xmin = 0.0 \n"
f"{tab * 3}xmax = {tier._entries[0][0]} \n"
f'{tab * 3}text = "" \n'
)
interval_index += 1 # <-- Interval index could potentially be iterated here
for i, entry in enumerate(tier._entries): # <-- in a for loop, which should interate only until i = len(tier._entries) - 1
start, end, label = entry
if (
includeBlankSpaces
and i > 0
and start - tier._entries[i - 1][1] > 0.001
):
fd.write(
f"{tab * 2}intervals [{interval_index}]:\n"
f"{tab * 3}xmin = {tier._entries[i-1][1]} \n"
f"{tab * 3}xmax = {start} \n"
f'{tab * 3}text = "" \n'
)
interval_index += 1 # <-- interval_index gets iterated if there's a blank space
fd.write(
f"{tab * 2}intervals [{interval_index}]:\n"
f"{tab * 3}xmin = {start} \n"
f"{tab * 3}xmax = {end} \n"
f'{tab * 3}text = "{tgio_utils.escapeQuotes(label)}" \n'
)
interval_index += 1 # <-- interval_index gets iterated again
if includeBlankSpaces and tier._entries:
if self.maxTimestamp - tier._entries[-1][1] > 0.001:
fd.write(
f"{tab * 2}intervals [{interval_index}]:\n"
f"{tab * 3}xmin = {tier._entries[-1][1]} \n"
f"{tab * 3}xmax = {self.maxTimestamp} \n"
f'{tab * 3}text = "" \n'
)
interval_index += 1 # <-- and possibly iterated again near the end
For Reproducing your issue
Please fill out the following:
Run the LibreSpeech example and check the output
This is using the exact instructions on the website, so I won't fill out the following
Corpus structure
What language is the corpus in?
How many files/speakers?
Are you using lab files or TextGrid files for input?
Dictionary
Are you using a dictionary from MFA? If so, which one?
If it's a custom dictionary, what is the phoneset?
Acoustic model
If you're using an acoustic model, is it one download through MFA? If so, which one?
If it's a model you've trained, what data was it trained on?
Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).
N/A
Desktop (please complete the following information):
OS: Linux
Version PopOS 22.04 LTS
Any other details about the setup N/A
Additional context
I'll try to work around this today, but I can possibly put in a PR later if I can figure out how to test and it is fairly simple to solve
Debugging checklist
[X] Have you updated to latest MFA version? [X] Have you tried rerunning the command with the
--clean
flag?Describe the issue I'm not an expert on anything here, but I was trying to parse the TextGrid ouptut with the TextGrid Python library when I ran into an issue. That library uses the
intervals: size
keyword in the TextGrid format to tell how large an intervals list is and doesn't have contingency case for when that size is wrong. In the MFA output, that size is wrong. This is from27-123349-0041.TextGrid
from the LibreSpeech example:Near as I can tell, this behavior stems from the
interval_index
intextgrid.py
being able grow pastlen(tier._entries)
whenincludeBlankSpaces
is enabled. Relevant code (comments mine):For Reproducing your issue
Please fill out the following: Run the LibreSpeech example and check the output
This is using the exact instructions on the website, so I won't fill out the following
Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
). N/A Desktop (please complete the following information):Additional context I'll try to work around this today, but I can possibly put in a PR later if I can figure out how to test and it is fairly simple to solve