JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
18 stars 1 forks source link

Including particle-less forms of expressions as sK forms e.g. 贅沢言う in the 贅沢を言う entry #134

Open Marcusjmdict opened 3 months ago

Marcusjmdict commented 3 months ago

I think we ought to add forms like 贅沢言う to entries like 贅沢を言う.

贅沢を言う 9357 82.6% 贅沢をいう 1596 14.1% ぜいたくを言う 369 3.3%

贅沢言う 11762 80.8% 贅沢いう 2003 13.8% ぜいたく言う 790 5.4%

JMdictProject commented 3 months ago

I see GG5 even uses it in an example: もらっておきながら贅沢言うな. It's a bit much to say that after you've accepted it.

I suspect it has a degree of informality cf the full term, but I don't have a problem with adding them as [sK] forms. Probably better than not having them, or having parallel entries.

Marcusjmdict commented 3 months ago

We have more than 3000 entries with an を in them. Most of them would probably qualify.

JMdictProject commented 3 months ago

Can we agree:

briankrznarich commented 2 weeks ago

I agree with everything above. I'll add that while を is simple, が・の particle-dropping(which seems just as valid as を-dropping) needs some additional consideration on where to put the particle-less form(が, の, both, most-popular entry...).

With that in mind, I'd also like to ask: could we use [sK] to completely avoid duplicating が・の pairs? (perhaps not universally, but where it seems harmless)

For example, I quite coincidentally just ran into such a case: 人聞きの悪い 10207 63.8% <-- we have this (disreputable; scandalous; disgraceful) 人聞きが悪い 2850 17.8% <-- not this 人聞き悪い 2932 18.3% <-- not this (this is what I saw in-the-wild)

Following Marcus' suggestion, we could stick 人聞き悪い[sK] into 人聞きの悪い. With that out of the way, do we really need a separate entry for the が form: 人聞きが悪い ? Or could we just [sK] them both into the existing entry and call it a day?

In the grand scheme, I think that in most cases we don't really benefit much from double-entering が・の pairs(we must have hundreds of them). Sometimes it lets use give better English [adj-f] glosses, but often we end up with entries like: 人気がある [verb] to be popular 人気のある [adj-f,exp] popular

It's even less productive when the tail is an adjective to begin with. For example, we have two long identical entries for:

つかみどころがない[exp,adj-f] | 8571 | 36.7% つかみどころのない[exp,adj-f] | 14793 | 63.3%

If we kept the が form, and [sk]'d the の form into it, the existence of the の form should be grammatically self-evident. All we lose is the ability to specify whether の or が is more common, which I think is quite minor here.

Getting back, for the particle-less つかみどころない(420 ngrams), we could [sk] it into either or both forms above. Technically I'd say "both" is the right answer, but it would be nice if there was just one entry for all of it.

This is all somewhat related to the discussion here, which is about condensing forms that differ only by grammatical transformation, while increasing discoverability in search results: https://github.com/JMdictProject/JMdictIssues/issues/87