Doublevil / JmdictFurigana

A Japanese dictionary resource that attaches furigana to individual words
152 stars 13 forks source link

JMNEDict support #8

Closed aehlke closed 7 years ago

aehlke commented 7 years ago

Having this data for the JMNEDict/ENAMDICT dictionary of names would be fantastic!

Doublevil commented 7 years ago

Done. Don't hesitate if you spot errors in the file, I just checked a few cases.

aehlke commented 7 years ago

Wow, thank you! 👯‍♂️

fasiha commented 7 years ago

Thanks @Doublevil!

A lazy question—I should check myself but how much overlap is there between JMDICT and the newly added JMNEDict entries? I ask because I wonder if now, when I look up a word in JmdictFurigana, if I have to add further checks to see if the search results are names vs not?

Doublevil commented 7 years ago

@fasiha If you access the entries by a combination of the kanji and kana readings, since the methods used to compute the "cuts" are the same, with the exception of included nanori readings for the JMNEDICT, there should be very little to no meaningful overlap (by that I mean entries with the same kanji and kana readings but different in how the reading is cut).

So if you find a word (again, using both kanji and kana readings as a key) in the JmdictFurigana file, you can assume it will be either absent or exactly the same in the JmnedictFurigana file, and reciprocally.

Now if you're looking to determine whether a word can or cannot be a name, I don't think this resource is the appropriate way to do so.