Kyubyong / g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Apache License 2.0
233 stars 30 forks source link

Some Chinese words are not included in the module #3

Open LLauryn opened 5 years ago

LLauryn commented 5 years ago

The library of Chinese grapheme-to-phoneme conversion is not complete. I have found part of missed Chinese words: 邓,吴,鄂,皖,蔡,萨,廖,宋,秦,刘,滧,闫,陕,郑,郝,犇,鹏,陇,祾,渭,邹,濮,梵,佟,韩,龚,洛,湘,婍,沂,隋,洣,潘,蒋,禹,喲,闽,湳,綪,睍,孻,汶,杭,吶,黔,渝,辽,銶,滇,灞,溁,浙,渤,邵,赣,淮,郸,彭,傣,蜀,沪,癍,郦,滕,滦,榣,姈,亳,漳,邢,涪,尧,昝,羲,媃,粤,鞑 from g2pc import G2pC g2p = G2pC() print(g2p("吴")) e.g. When I input the text "邓小平", the result for "邓" is ('邓', 'nr', '邓', '邓', '', '邓'). When I input "吴", the result is ('吴', 'nr', '吴', '吴', '', '吴'), etc. All of words I post have the same problem like the examples above.

Kyubyong commented 5 years ago

Thanks. Most of them are used for names. I fixed the bug so update the library to check the new results. Some of them are still missing because they are not in cedict. Let me find a solution to this in the near future.

melspectrum007 commented 4 years ago

@Kyubyong Thanks for your impressive work. I also found Some Chinese words are not included in the module, such as "琊". Cound you update and include these missing Chinese words?

thanks

melspectrum007 commented 4 years ago

Another question, how many Chinese word is included in the model? Cound you include the full Chinese Dictonary? Thanks