JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
18 stars 1 forks source link

Handling ~もの/~もん entries #66

Closed JMdictProject closed 2 years ago

JMdictProject commented 2 years ago

The issue has come up about how best to handle the colloquial ~もん of the common ~物/もの type of terms. It is discussed extensively in the comments in the 買い物 entry (https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1589730) and is also mentioned in issue #63. While the ~もの/~もん versions could be split into separate entries, there is a case for keeping them together, although in our present approach it would lead to some complex reading restrictions.

To a large extent, the reading restrictions are there to allow for the correct generation of the old simple EDICT format, so as to avoid entries of the form: "買いもの [ かいもん]". We have already relaxed the alignment rules to simplify hiragana/katakana matching, and I wonder if it isn't time to relax them further to allow for the ~もの/~もん forms to be handled more simply. I think there are very few sites/apps which use that simple legacy EDICT format any more. Is it time to relax further?

For the ~もの/~もん situations there is also the matter of the ~もん being a bit more colloquial. Do we need to note that in some way?

Marcusjmdict commented 2 years ago

I think the easiest solution is to include かいもん as a reading in the 買い物 entry and include 買いもの and 買いもん as hidden forms only. I don't think -もん readings should be hidden away though, that could be confusing.

買い物 68063816 99.6% 買いもの 268422 0.4% 買いもん 14236 0.0%

JMdictProject commented 2 years ago

I think all the ~もの/~もん are now dealt with. I'll close this.