MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 246 forks source link

Getting more accurate phone boundaries by passing in word boundaries #596

Open agupta54 opened 1 year ago

agupta54 commented 1 year ago

I was testing MFA and CTC-segmentation based forced alignment techniques for English. Most open source CTC English models are trained on characters and thus by using CTC-segmentation for forced alignment we get character and word boundaries. In some cases I found these word boundaries to be more accurate than MFA word boundaries. But since I need phoneme duration for a downstream TTS task, I couldn't do much about it. Is there a way to get more accurate phoneme boundaries if we know accurate word boundaries from another method using the current toolkit? What can be changed in the current toolkit to achieve this if it is already not possible.