Doublevil / JmdictFurigana

A Japanese dictionary resource that attaches furigana to individual words
150 stars 13 forks source link

Missing entries? #21

Closed fasiha closed 3 months ago

fasiha commented 3 months ago

Hello Doublevil! I noticed that the latest JmdictFurigana.json is missing some entries that are in JMdict:

Do you think there was a regression, or am I making a mistake somewhere?

Doublevil commented 3 months ago

Hello and thank you for your continued interest in this project, These are all special readings that are not registered in the special readings list. I don't think it's a regression, as I wasn't able to find these entries in older releases either. Because special readings are exceptions that have to be listed manually, it's expected that some will be missing. However, you are welcome to contribute on the special readings file to add missing readings.

fasiha commented 3 months ago

Aha! I didn't know about that file, some of these are on there, so I will use it as a fallback!

Doublevil commented 3 months ago

No no, sorry for the confusion, this file is not intended to be used as a fallback; it's used in the process of creating the output txt and json files. If entries were added in that file for the words you mention, the entries would be resolved correctly and appear in the next release files.

fasiha commented 3 months ago

Now I fully understand. Hmm. I see "甲斐性" → "かいしょう" in the file, which is one of the words in my list: https://github.com/Doublevil/JmdictFurigana/blob/9ce6da4e2ec5a2603413552ed481efeda3557ace/JmdictFurigana/Resources/SpecialReadings.txt#L814 but in the JSON file I only see one entry for 甲斐性, "かいしょ" (missing the final う). Should the above special reading be in the JSON?

Doublevil commented 3 months ago

After analyzing this particular case, it looks like it being in the file is the mistake. The 甲斐性/かいしょう entry currently cannot be solved because it has two potential solutions: either the one you quoted in the special readings (かいしょう over the 3 characters), or another one being composed of かい over the two first characters, and then しょう over the last one. In these cases where multiple solutions are available, the entry is considered unsolved and is not added to the file.

So, on this particular one, removing the line from the special readings file would allow a solution (the 2nd one) to prevail and be featured in the output file.