cburgmer / cjklib

Han character library for CJKV languages
Other
149 stars 49 forks source link

Pinyin to MandarinIPA bugs #9

Open trevorld opened 7 years ago

trevorld commented 7 years ago

Thanks for your wonderful cjklib and cjknife command-line tool. When making system calls to cjknife to produce IPA for some Pinyin (I'm writing a command-line pinyin drilling program in R) and I noticed some bugs in the production of MandarinIPA using the following system call:

cjknife -s Pinyin -t MandarinIPA -m pinyin_to_convert_to_ipa
  1. cjknife throws an error when asking it to convert the legitimate pinyin yo, m, n, ng, hng, and hm. I've seen yo (final io without an initial) cast in ipa as [jo] or [jɔ]. Sometimes they use the i with a tilde underneath instead of a j. According to Wikipedia's syllabic consonant page you should be able to use [m̩], [n̩], [ŋ̍], [xŋ̍], and [xm̩] for those Mandarin syllabic consonant interjections (IPA adds a little line above or below to signify it is a syllabic consonant).

  2. cjknife gives 'o' IPA for Pinyin (u)o after b, p, m, f where it would have a 'wo' sound e.g. po = [pʰwo] not [p‘o]. Although written with an 'o' in fact bo, po, mo, fo (and wo) all have "uo" finals. The only examples of pure "o" finals are the interjection "o" and the rather rare participle "lo" (yo being the only example of the "io" final).

  3. cjknife gives incorrect IPA for erhua e.g. dianr3 = tjɐɚ̯ not tiɛn.ər If we restrict the erhua to what is expected to know in order to pass the 普通话水平测试 exam (i.e. who has a standard Mandarin pronunciation) we still have a lot of erhua syllables. For comparison I've compiled by own Mandarin syllable to IPA mapping:

    https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_ipa.csv

    which I built from the following tables I compiled (the final and initial one mainly from the Pinyin and Erhua pages on Wikipedia but also from other sources) and the pinyin to initial to final I decomposed by hand from all the pinyin examples I could find):

    https://u14129277.dl.dropboxusercontent.com/u/14129277/initial.csv

    https://u14129277.dl.dropboxusercontent.com/u/14129277/final.csv

https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_initial_final.csv

Thanks!

cburgmer commented 7 years ago

Hey, thanks for the detailed drill down.

Sadly I don't currently have the time nor the focus to take care of that. I've made you a collaborator to this project, and invite you to fix this directly. Happy to try answering anything that comes up wrt the code. :)

trevorld commented 7 years ago

Okay, I have a couple conferences coming up so it may take me a couple months before I have the free time to fix it.