Edict entry ID is parsed incorrectly

This is a mild issue since it basically breaks compatibility with any files that store edict IDs directly. If this is going to get fixed it should be fixed ASAP before spark reader becomes too popular.

According to the format page: http://www.edrdg.org/jmdict/edict_doc.html#IREF02

The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.

The current code is something like:

String IDCode = bits[bits.length - 1].replaceFirst("Ent", "")

It should be something like:

String IDCode = bits[bits.length - 1].replaceFirst("EntL", "").replaceFirst("X", "");

Ran into this because I decided to print the ID in the definition popdown and noticed massive negative values, from hashing failed conversions. I did that because I want to make a blacklist for certain surface forms (spellings) to not be allowed to be interpreted as certain Edict entries (e.g. はそう as 半挿).

An alternative is to store the existing interpretation of the ID (where it falls back to a hash) in existing files so that compatibility isn't broken, but to use the real edict id for new stuff like the blacklist I want to make. I wouldn't really like this, but it's a reasonable compromise. You could also detect if preferredDefs has bogus IDs in it and automatically change them to the right version by finding which definition for that word has the same id-string-sans-Ent hash, but that's complicated.

LaurensWeyn / Spark-Reader

Edict entry ID is parsed incorrectly #13