FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.06k stars 223 forks source link

[Documentation Request] A guide on how to create dictionaries #2206

Open wildwestrom opened 2 years ago

wildwestrom commented 2 years ago

Here's the situation.

I have some XML files with dictionary data not available for yomichan. I'd like to take that data and turn it into JSON data that can be read by yomichan. I found that there are various JSON schemas. I can also deduce what some of the fields mean, while others (e.g the format enum) are not obviously documented.

If somebody understands the internals better, I think having documentation for creating dictionaries would be a great boon to the users and developers.

MarvNC commented 2 years ago

I was wondering the same thing. It seems format is usually 3 but I have no idea why; maybe it's the version of the schema?

In the tag bank, I was wondering what the second term, category is. It seems it might be better as an enum with some documentation? Looking at KANJIDIC, there are frequent, misc, code, class. and index. I think they're hard coded here with the exception of frequent, which I assume is for tags.

And I'm not sure what the popularity score is for either. It seems to be like frequency of a term, but you'd think that normally goes into the term/kanji bank. Or maybe tags about frequency like common, rare, etc. could provide an additional point of frequency to fall back on?

toasted-nutbread commented 2 years ago

You're right, it's not a well documented process at this point, aside from some stuff in the schemas. Something that I often find helpful is extracting some of the dictionaries included on the Yomichan homepage and to use them as reference.

format should be 3 for all new dictionaries. The other versions correspond to legacy dictionary formats which supported less information than the current ones.

Tag banks: category is somewhat arbitrary, but it's kind of used for two purposes:

Tag banks: (popularity) score is used for sorting tags, and the numbering is also somewhat arbitrary. There is a similar score value used for definitions. Higher score tags will show up earlier in the tag list, lower score tags show up later. There is a similar sorting based on score for definitions.