cmu-llab / wikihan

Creative Commons Zero v1.0 Universal
11 stars 1 forks source link

Convert all IPA characters with diacritics to NFD #4

Open kalvinchang opened 1 year ago

kalvinchang commented 1 year ago

All IPA characters with diacritics should be composed of 2 Unicode characters for ease of downstream parsing (e.g. in panphon)

3

kalvinchang commented 1 year ago

for Middle Chinese, the remaining /ɡ/ entries are in the heteronym files - rerun baxterdizer.py on an updated version of mc-pron-heteronyms.csv