dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
625 stars 120 forks source link

(Updated) Add Japanese (Katakana, Hiragana) #143

Closed lart-rt closed 1 year ago

lart-rt commented 1 year ago

(Updated: In the previous pull request, some commits were behind master branch, so I resend a new pull request.)

Added “map” and “post” files for Katakana and Hiragana in Japanese.

First, we created the files for Katakana. And then, we produced the files for Hiragana by replacing Katakana in the previous files with Hiragana because the two writing systems are in one-to-one correspondence.

The files without the suffix “-red” are based on the descriptions in the following literature about linguistics and phonetics of Japanese:

The files with “-red” are more “reduced” version based on the description in the following articles in Wikipedia.

This commit DOESN’T support Kanji (Chinese characters) in Japanese because how to read it is highly dependent on context and the number of Kanji is much larger than those of Katakana and Hiragana.

dmort27 commented 1 year ago

Thanks for this! Looks good!