When to use G2P models? When to use acoustic models?

fishfree commented 5 months ago

I cannot differentiate very well the 2 types of models. Hope somebody give a clear explanation. Many thanks!

mmcauliffe commented 5 months ago

G2P models are used for generating pronunciations for a word, so using the english_arpa G2P model for "cat" would give you "K AE1 T'. This model does not use audio and is simply just a model of how to get a phonological representation of an unknown word trained on the general mapping from orthography to phones in a training dictionary.

These pronunciations are used during alignment of audio files to convert the orthographic text to a sequence phones that maps onto the audio.

More practically, you use G2P models with the mfa g2p command to generate new pronunciations for pronunciation dictionaries, and acoustic models and pronunciation dictionaries with mfa align to generate speech-text alignment. Note that you can pass a G2P model to the mfa align command to generate pronunciations for any unknown words during alignment.

See the first steps page for various basic use cases for commands and models and docs on mfa G2P and docs on mfa align for details on those commands.

fishfree commented 5 months ago

@mmcauliffe Thank you so much, Michael!

MontrealCorpusTools / Montreal-Forced-Aligner

When to use G2P models? When to use acoustic models? #813