MontrealCorpusTools / mfa-models

Collection of pretrained models for the Montreal Forced Aligner
Creative Commons Attribution 4.0 International
103 stars 19 forks source link

English mfa dictionary and corresponding G2P model #24

Open vivian556123 opened 10 months ago

vivian556123 commented 10 months ago

Hi, I want to use mfa pretrained English_mfa acoustic model and dictionary for alignment. I also want to use the same dictionary for G2P (from text to phoneme). What is the corresponding G2P model for me to transform a text into phoneme? I want to use it for tts inference.

Thanks a lot!

mmcauliffe commented 10 months ago

For US English, the G2P model here: https://mfa-models.readthedocs.io/en/latest/g2p/English/English%20%28US%29%20MFA%20G2P%20model%20v2_0_0a.html was trained on the US pronunciation dictionary here: https://mfa-models.readthedocs.io/en/latest/dictionary/English/English%20%28US%29%20MFA%20dictionary%20v2_0_0a.html. However, do note that the dictionaries and G2P models are optimized for recognition of variation within and across dialects, so I don't know how applicable they would be for a TTS system that would I would imagine benefits for less variation.

iamanigeeit commented 5 months ago

@vivian556123 If you want to generate using python instead of running in batches, you can

from montreal_forced_aligner.g2p.generator import PyniniGenerator
from montreal_forced_aligner.models import G2PModel, ModelManager
language = "english_us_mfa"

# If you haven't downloaded the model
# manager = ModelManager()
# manager.download_model("g2p", language)

model_path = G2PModel.get_pretrained_path(language)
g2p = PyniniGenerator(g2p_model_path=model_path, num_pronunciations=1)
g2p.setup()

Then call g2p.rewriter

>>> g2p.rewriter('my time')
['m aj tʰ aj m', 'm ɑ tʰ aj m', 'm ə tʰ aj m', 'mʲ i tʰ aj m']

However, i think there is no point using the MFA G2P, as the results are not sorted in order of likelihood. In fact, it seems that just mapping every word to the most common pronunciation is more accurate and faster. I would recommend using a different G2P library, as long as the phonemes are compatible (e.g. ARPA). For example https://pypi.org/project/g2p-en/