MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 243 forks source link

How can I do segmentation on syllable level? #158

Open fatcatZF opened 4 years ago

fatcatZF commented 4 years ago

Is it possible to do segmentation on syllable level? thanks in advance.

sheylagarci commented 3 years ago

Hi I am also trying to figure out if segmentation on syllable level is possible. I was wondering if you were successful in finding if this is possible?

danmaevsky commented 1 month ago

Etiquette be damned, I am reviving this thread. Have either of you discovered a way to do this, or perhaps any other systems that can achieve this syllable-level segmentation?

mmcauliffe commented 1 month ago

You can look at https://babel.ling.upenn.edu/phonetics/old_website_2015/p2tk/index.html or https://americanphonetics.ruhosting.nl/course/8-consonants-and-syllables/8-7-1the-mop/ for implementing the Maximum Onset Principle heuristic. Basically it's a matter of deciding what phone symbols can be nuclei, and then using the beginnings and endings of words to analyze what segments can belong to onsets vs codas. Weak Syllable Principle will likely not be doable or robust since supersegmental features like stress aren't encoded.

I'm assuming here that you're interested in English, but MOP does work for other languages (WSP less so). What's the use case you have for it? It's typically easy for listeners to tell how many syllables a word has, but it's harder for them to tell exactly where the syllable boundaries are. Words like "happy" or "lemon" in English have variable interpretations of whether they should be "ha-ppy" vs "happ-y" and "le-mon" vs "lem-on". And then there's also a caveat that most work on syllabification has been done for citation/dictionary forms, I'm not sure how applicable it is to spontaneous speech with reductions and deletions of segments and syllables.