atilika / kuromoji

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Apache License 2.0
950 stars 131 forks source link

Obtain furigana? #121

Closed 0x6C38 closed 6 years ago

0x6C38 commented 6 years ago

Hi, the documentation says kuromoji can extract the readings for kanji and shows an example in which the reading for each token is extracted. However, is it possible to extract what part of the reading corresponds to each kanji?

For example, given a token with the contents:

"寿司" -> 寿 = ス, 司 = シ
cmoen commented 6 years ago

Kuromoji provides reading information at the token/morpheme level. Reading for each kanji can be inferred by using per-kanji readings found in kanji dictionaries, such as kanjidic2 (http://www.edrdg.org/kanjidic/kanjd2index.html) which is freely available.