Closed romulomello closed 1 month ago
It does align the full text to the audio with varying scores that indicate the probability of the word existence, so you need to think of a different approach to the problem, for example, pre/post process the asr output to assign the low scores for the missing words (ideally -inf
) and align the rest of the ASR output and merge the two together
I liked the forced alignment; I was having an issue and would like to know if it's possible to use your code to help me. I have an output from an ASR model and a text that I expected, but in most cases, the ASR output doesn't cover even half of the expected text, and sometimes it's quite distant from it. I would like to try to align as much of the ASR words as possible based on the reference, and for those reference words that aren't in the ASR, to assign them a very low score. From the initial tests I've done, it seems like it's trying to align the entire text within the audio