ani-hovhannisyan / kanji-visualization

Kanji words visualization graph draws relational graph for kanjis of particular words in Japanese. Aim is to understand the relational graph of one kanji within different words and it's relations to all possible words.
MIT License
5 stars 1 forks source link

Explore the kanji dictionary #3

Open teruto725 opened 2 years ago

teruto725 commented 2 years ago

Explore the kanji dictionary and think how to create the graphical view using it.

ani-hovhannisyan commented 2 years ago

Currently, I know these: For kanji words: JMdict - http://www.edrdg.org/wiki/index.php/JMdict-EDICT_Dictionary_Project JMdict is a general dictionary with roughly 170 000 entries and is a source of the bulk of the words in Jisho.org.

For more special kanji words: JMnedict - http://www.edrdg.org/enamdict/enamdict_doc.html JMnedict, is an immense database of Japanese proper names for people, companies and locations.

For onyomi and kunyomi: KANJIDIC2 - http://www.edrdg.org/wiki/index.php/KANJIDIC_Project KANJIDIC2, is a database of kanji that includes readings, meanings and a lot of metadata around kanji like lookup numbers for kanji dictionary books, stroke count and information about variant forms.

wowry commented 2 years ago

Jamdict library can be run with Python, and can get onyomi, kunyomi and meaning of kanji, etc. It uses JMDict, JMnedict, KanjiDic2 and KRADFILE/RADKFILE, which is Kanji-radical and radical-kanji maps.

https://jamdict.readthedocs.io/en/latest/#sample-jamdict-python-code

ani-hovhannisyan commented 2 years ago

The WordNet http://compling.hss.ntu.edu.sg/wnja/ has semantic search (this can be considered for later semantic search feature implementations). It has: 57,238 concepts (synsets), 93,834 words, 158,058 senses (synset-word pairs), 135,692 definitions, 48,276 examples.

wowry commented 2 years ago

Just for reference, there are also some papers that try to measure the similarity of words in the Japanese WordNet.

https://ieeexplore.ieee.org/document/7373891

wowry commented 2 years ago

I wrote a sample code using Jamdict, so you can try it.

https://gist.github.com/wowry/840715df27d75171c5a8456e76275862

ani-hovhannisyan commented 2 years ago

In the Graph Controller, when the kanji word list should be prepared I will not use Jamdict at this point. I'll use already ready "data" from "The Kanji Map" project https://github.com/gabor-kovacs/the-kanji-map/tree/main/public/data. Here all words are already prepared so it will be easy to implement for now. I will also leave the empty inteface which later can be implemented if Jamdict similar libraries are going to be used. The usage of word data is related to issue #36. and the prepared data will be used in the issue #43 .

ani-hovhannisyan commented 2 years ago

As now, more and less the libraries are decided:

ani-hovhannisyan commented 1 year ago

Reopening considering task #105 comment https://github.com/ani-hovhannisyan/kanji-visualization/issues/105#issuecomment-1561031105