MahmoudAshraf97 / ctc-forced-aligner

Text to speech alignment using CTC forced alignment
146 stars 30 forks source link

Word Alignment Between ASR Output and Expected Text with Support for Discrepancies in Matching #12

Closed romulomello closed 1 month ago

romulomello commented 3 months ago

I liked the forced alignment; I was having an issue and would like to know if it's possible to use your code to help me. I have an output from an ASR model and a text that I expected, but in most cases, the ASR output doesn't cover even half of the expected text, and sometimes it's quite distant from it. I would like to try to align as much of the ASR words as possible based on the reference, and for those reference words that aren't in the ASR, to assign them a very low score. From the initial tests I've done, it seems like it's trying to align the entire text within the audio

MahmoudAshraf97 commented 3 months ago

It does align the full text to the audio with varying scores that indicate the probability of the word existence, so you need to think of a different approach to the problem, for example, pre/post process the asr output to assign the low scores for the missing words (ideally -inf) and align the rest of the ASR output and merge the two together