Ignore audio that doesn't match transcript

Is your feature request related to a problem? Please describe. Unfortunately, in my dataset there is about 0.5s of irrelevant spoken words at the beginning and end of each utterance audio file. The "noisy" part is only at the beginning and end, there are no "noise" words in the middle of the utterance. When I run align, it always tries to align the first transcript word with the first non-silence part of the audio.

Describe the solution you'd like It would be great if the irrelevant words were ignored and a silence token is put in place in the textgrid.

Describe alternatives you've considered I considered trimming the files automatically but the start of the relevant segment of the audio varies between 0s and 0.5s. it is hard to determine where the relevant segment begins.

Please let me know if this feature already exists. Thank you

MontrealCorpusTools / Montreal-Forced-Aligner

Ignore audio that doesn't match transcript #650