Open wildwestrom opened 2 years ago
I was wondering the same thing. It seems format
is usually 3 but I have no idea why; maybe it's the version of the schema?
In the tag bank, I was wondering what the second term, category is. It seems it might be better as an enum with some documentation? Looking at KANJIDIC, there are frequent
, misc
, code
, class
. and index
. I think they're hard coded here with the exception of frequent
, which I assume is for tags.
And I'm not sure what the popularity score is for either. It seems to be like frequency of a term, but you'd think that normally goes into the term/kanji bank. Or maybe tags about frequency like common
, rare
, etc. could provide an additional point of frequency to fall back on?
You're right, it's not a well documented process at this point, aside from some stuff in the schemas. Something that I often find helpful is extracting some of the dictionaries included on the Yomichan homepage and to use them as reference.
format
should be 3
for all new dictionaries. The other versions correspond to legacy dictionary formats which supported less information than the current ones.
Tag banks: category
is somewhat arbitrary, but it's kind of used for two purposes:
Tag banks: (popularity) score
is used for sorting tags, and the numbering is also somewhat arbitrary. There is a similar score
value used for definitions. Higher score tags will show up earlier in the tag list, lower score tags show up later. There is a similar sorting based on score for definitions.
Here's the situation.
I have some XML files with dictionary data not available for yomichan. I'd like to take that data and turn it into JSON data that can be read by yomichan. I found that there are various JSON schemas. I can also deduce what some of the fields mean, while others (e.g the
format
enum) are not obviously documented.If somebody understands the internals better, I think having documentation for creating dictionaries would be a great boon to the users and developers.