Closed fasiha closed 3 years ago
Of course they could be added, but I'm not sure why they would be needed in the first place. Json entries have both a "text" field (the writing with kanji) and a "reading" field (the writing in hiragana).
So to use the same example, if you have this entry:
{
"text": "川柳",
"reading": "せんりゅう",
"furigana": [
(...)
]
}
You know this entry is for the せんりゅう reading of 川柳.
Am I missing something?
No answer, so I'm closing the issue. Please re-open if there are news on this.
Hello, I forwarded your concern to the mailing list. Jim wrote this back in response:
My interest was to add the derived furigana as an option to the WWWJDIC server. To do this I'd need to align the dictionary entries with the JmdictFurigana data. The alignment would be simple if the sequence numbers were in both streams. Otherwise, I'd have to do a (daily) kanji/reading alignment as I use the latest dictionary version. Not impossible but messy when many readings have multiple kanji forms and readings
As fasiha said, I would be happy to help if this is something you wanted to tackle.
Of course they could be added, but I'm not sure why they would be needed in the first place. Json entries have both a "text" field (the writing with kanji) and a "reading" field (the writing in hiragana).
So to use the same example, if you have this entry:
{ "text": "川柳", "reading": "せんりゅう", "furigana": [ (...) ] }
You know this entry is for the せんりゅう reading of 川柳.
Am I missing something?
I think the point Jim was making related to database query cost. In your structure, there primary key (so to speak) would be (text, reading) I guess. So a query to match up a jmdict entry with the entry in your file would involve comparing those two properties. However, the jmdict entry already has the reading
and if it where a normalized database table, it'd probably be just the sequence number and furigana, you would not need that reading
for the furigana table and could just query based on the sequence number primary key.
Obviously, if there's other use-cases for this file besides pairing up with jmdict, leaving the (text, reading) primary key makes sense.
In the end it really just depends how the consuming system is going to use your file.
If it's possible to export two different json versions (one with ent_seq
and one with (text
, reading
)), that would probably the most convenient solution for everybody.
I'm not sure if you saw this—I mentioned JmdictFurigana on the EDICT-JMdict mailing list, in the context of assigning furigana to entries, and Jim Breen asked
(It looks like you'll need a Google account and access to the group to see the thread online?)
I'm not sure how difficult this might be, so I thought to make an issue at least to track the request. The author of https://github.com/Doublevil/JmdictFurigana/pull/16 also chimed in on that thread so you might have some help 😁. Thanks as always!