Closed wareya closed 7 years ago
Whoops.
I'm surprised it's gone this long without me or someone else noticing it. I guess since this is still a beta it would be fine to change it still. This only affects preferredDefs externally at the moment, which isn't too big a deal.
I'll fix it and add an incompatibility warning.
This is a mild issue since it basically breaks compatibility with any files that store edict IDs directly. If this is going to get fixed it should be fixed ASAP before spark reader becomes too popular.
According to the format page: http://www.edrdg.org/jmdict/edict_doc.html#IREF02
The current code is something like:
It should be something like:
Ran into this because I decided to print the ID in the definition popdown and noticed massive negative values, from hashing failed conversions. I did that because I want to make a blacklist for certain surface forms (spellings) to not be allowed to be interpreted as certain Edict entries (e.g. はそう as 半挿).
An alternative is to store the existing interpretation of the ID (where it falls back to a hash) in existing files so that compatibility isn't broken, but to use the real edict id for new stuff like the blacklist I want to make. I wouldn't really like this, but it's a reasonable compromise. You could also detect if preferredDefs has bogus IDs in it and automatically change them to the right version by finding which definition for that word has the same id-string-sans-Ent hash, but that's complicated.