Closed fishfree closed 5 months ago
G2P models are used for generating pronunciations for a word, so using the english_arpa G2P model for "cat" would give you "K AE1 T'. This model does not use audio and is simply just a model of how to get a phonological representation of an unknown word trained on the general mapping from orthography to phones in a training dictionary.
These pronunciations are used during alignment of audio files to convert the orthographic text to a sequence phones that maps onto the audio.
More practically, you use G2P models with the mfa g2p
command to generate new pronunciations for pronunciation dictionaries, and acoustic models and pronunciation dictionaries with mfa align
to generate speech-text alignment. Note that you can pass a G2P model to the mfa align
command to generate pronunciations for any unknown words during alignment.
See the first steps page for various basic use cases for commands and models and docs on mfa G2P and docs on mfa align for details on those commands.
@mmcauliffe Thank you so much, Michael!
I cannot differentiate very well the 2 types of models. Hope somebody give a clear explanation. Many thanks!