Kyubyong / g2p

g2p: English Grapheme To Phoneme Conversion
Apache License 2.0
810 stars 129 forks source link

add load_custom_phonemes() #33

Open veelion opened 1 year ago

veelion commented 1 year ago

For many technical terms, e.g. "AI", "GitHub", phonemes converted by seq2se2 are not the right pronunciation, so a custom dict is necessary.

The added method load_custom_phonemes(self, file_path) to G2p, reads a cmudict like file to self.cmu. usage:

from g2p_en import G2p

texts = [
        "AI is popular on GitHub.",
        ]
g2p = G2p()
for text in texts:
    out = g2p(text)
    print(out)

g2p.load_custom_phonemes('./z-custom')
for text in texts:
    out = g2p(text)
    print(out)

['AY1', ' ', 'IH1', 'Z', ' ', 'P', 'AA1', 'P', 'Y', 'AH0', 'L', 'ER0', ' ', 'AA1', 'N', ' ', 'G', 'IH1', 'TH', 'AH0', 'B', ' ', '.']
['EY1', 'AY1', ' ', 'IH1', 'Z', ' ', 'P', 'AA1', 'P', 'Y', 'AH0', 'L', 'ER0', ' ', 'AA1', 'N', ' ', 'G', 'IH0', 'T', 'HH', 'AH1', 'B', ' ', '.']