MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 247 forks source link

[BUG] Quotes treated as clitics #492

Closed iamanigeeit closed 2 years ago

iamanigeeit commented 2 years ago

For Reproducing your issue Try mfa align on an English sentence starting with a quote mark and s, e.g. Hello 'smart guy'. The 's will be treated as a clitic and the resulting TextGrid will have a 's label.

I recommend using the regex (\W|^)<QUOTE MARK>(\w[\w <PUNCTUATION>]*)<QUOTE MARK>(\W|$) to detect quotes (replace \<QUOTE MARK> with the list of quote marks and \<PUNCTUATION> with the list of punctuation defined in the global options).

LJ018-0208.zip