MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.33k stars 246 forks source link

Handling of dictionary with multiple entries for same word #161

Open rafaelvalle opened 4 years ago

rafaelvalle commented 4 years ago

How does the binary 'lib/align' handle entries in the dictionary with multiple values ? For example:

A  EY1
A(1)  AH1
shakhrayv commented 3 years ago

I’m also trying to find the same information. Having run an algorithm a couple of times it seems like the choice is not random. Is there some model for this or does mfa choose the right pronunciation based on an acoustic model somehow?

mmcauliffe commented 3 years ago

Do you mean the case where the word A has multiple transcriptions? In the above example, it should always pick A because A(1) is not considered a pronunciation variant (different orthography)

For the case of

A EY1
A AH1

The pronunciation weights are equal, so over the course of training, the acoustic model will start carrying the bulk for deciding between them. You can also specify probabilities between different word forms, i.e.:

A 0.25 EY1
A 1 AH1

Note that they don't sum to one. The convention is to make the highest probability pronunciation have 1, so that it isn't penalized for having many variants and reducing accuracy.

You can also estimate them from a speech corpus with the mfa train_dictionary command (https://montreal-forced-aligner.readthedocs.io/en/latest/training_dictionary.html#training-dictionary). It will align the corpus and estimate the probabilities from the counts of pronunciations that the aligner picked when aligning.

binbinxue commented 2 years ago

can you elaborate on the bit 'acoustic model will start carrying the bulk for deciding between them'? The acoustic models are GMM-HMM model right? so gaussian mixture model for the audio part and hidden markov model for picking the most likely sequence of phonemes to decide which variant of phoneme for the word to use? thanks