Kyubyong / g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Apache License 2.0
235 stars 30 forks source link

The tone change rules are not implemented well #4

Open LLauryn opened 5 years ago

LLauryn commented 5 years ago

https://resources.allsetlearning.com/chinese/pronunciation/Tone_change_rules#Why_Tone_Changes_Are_Not_Written The website above is the specific description about the rules. And when I used the code below, from g2pc import G2pC g2p = G2pC() print(g2p("卡尔普")) the result was [('卡', 'nr', 'qia3', 'qia2', '/to block/to be stuck/to be wedged/customs station/a clip/a fastener/a checkpost/Taiwan pr. [ka3]/', '卡'), ('尔', 'nr', 'er3', 'er3', '/variant of 爾|尔[er3]/', '尒'), ('普', 'nr', 'pu3', 'pu3', '/general/popular/everywhere/universal/', '普')]. Actually, the correct conversion for "尔" should be 'er2' because the pronunciation of next word "普" is 'pu3'. In addition, when the input was the sentence “老虎幼崽与宠物犬玩耍”, the result was [('老虎', 'n', 'lao3 hu3', 'lao2 hu3', '/tiger/CL:隻|只[zhi1]/', '老虎'), ('幼崽', 'n', 'you4 zai3', 'you4 zai2', '/young (of an animal)/', '幼崽'), ('与', 'p', 'yu3', 'yu3', '/and/to give/together with/', '與'), ('宠', 'n', 'chong3', 'chong3', '/to love/to pamper/to spoil/to favor/', '寵'), ('物', 'n', 'wu4', 'wu4', '/thing/object/matter/abbr. for physics 物理/', '物'), ('犬', 'n', 'quan3', 'quan3', '/dog/', '犬'), ('玩耍', 'v', 'wan2 shua3', 'wan2 shua3', '/to play (as children do)/to amuse oneself/', '玩耍')] The conversion for "与" is wrong for the same reason as before.

In my opinion, Tone Changes for Multiple Third Tones may be always depend on the pronunciation of the next word.

Kyubyong commented 5 years ago

Thanks! I found it tricky to apply the tone change rules properly. Let me have some time to dig into it.