Kyubyong / g2p

g2p: English Grapheme To Phoneme Conversion
Apache License 2.0
811 stars 128 forks source link

Should UW be included in the phoneme set? #10

Open jaeseongyou opened 4 years ago

jaeseongyou commented 4 years ago

Should UW be included in the phoneme set? It seems g2p.phonemes operates under the general rule of of excluding the 'parent' category when its variants exist. For example, AA is not included since its variants AA0, AA1, AA2 are in the set. Same for AE, AH, AW, AY, etc. But UW seems to be the only exception. Furthermore, when I do simple frequency analyses on sizable corpora (not super rigorously though), UW never occurs while its variants do. I wonder if the phoneme set can safely forgo UW.

iclementine commented 3 years ago

I have the same question for this.