JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

虫も殺...? Is there any special handling of idiomatic expressions that end with (or contain) negation (and thus potentially have 3+ forms) ない・ず・ぬ #87

Open briankrznarich opened 1 year ago

briankrznarich commented 1 year ago

... particularly forms that historically end in ぬ.

To lead with, I don't think this is a massive problem, so maybe a continued brute-force "copy-paste" solution is fine. But before I go duplicating a bunch of entries, I wanted to ask...

As an example of the problem I'm looking at, we currently have two independent-but-otherwise-identical entries for: 背に腹は替えられぬ and 背に腹はかえられない (they do cross-reference each other, pointing to the ない form)

And today I just ran into this expression in the wild: 墓に布団は着せられず https://japanese-note.jp/fuujyu-no-tan/ I wanted to see if it was in jmdict, but no results were returned. But on a different search on "filial" I stumbled upon: 墓に布団は着せられぬ

And, if I now google for "墓に布団は着せられない", it also exists (least popular of the three, maybe not worth noting).

So, what to do?

  1. Nothing (I think it's too common to ignore)
  2. Copy/paste the entry (seems to be current policy)
  3. List ず as an alternate reading in the existing entry, either overtly or with [sK].
  4. Something else?

For most purposes I am aware of, it feels like everyone would be best served if somehow these entries could be unified. This doesn't seem to be done today, but as one example of where it has been done, we have a few grammatical entries like "なければならない", which contains all of: なければなりません、なければいけない、なければいけません、ねばならぬ、ねばならない、ねばなりません、なければならぬ、なけばならない

=======

I've gone hunting for other issues of this class, and just as one more example, there is currently an entry for 虫も殺さぬ (Expressions (phrases, clauses, etc.), Pre-noun adjectival (rentaishi)) That is the only entry in jmdict that begins with "虫も殺", but if you type "虫も殺" in a google search bar, it starts autocompleting with "虫も殺さない"(both forms do appear though). And this entry actual looks even worse, as it contains potential forms... せない、せぬ as well...

n-grams: 虫も殺さぬ | 1641 虫も殺さない | 1315 虫も殺せない | 1302   虫も殺せぬ | 412 虫も殺さず | 105 (I'd guess related to, but not the actual idiom)

If it were me(in a different universe with a different database policy), I'd be tempted to list 虫も殺さない as a primary form (with a promotion vs n-grams for being the modern usage), 虫も殺さぬ as second form, and probably 虫も殺せない and 虫も殺せぬ as [sK]. (Though in this particular expression, "won't kill a fly/couldn't kill a fly", maybe these really should also be considered primary forms as well. Do we need 4 jmdict entries?).

For reference, sankoku (modern dictionary) has a headword for 虫も殺さない(句), and lists the ぬ form at the end of the entry. More prestigious dictionaries may prefer the ぬ form. Arguably if a foreign student of Japanese student were to use this term today, they should probably pick the ない form I would think, with せない and さない almost interchangeable...

polm commented 1 year ago

This looks like a variation of #69, where the solution was to add the inflected forms.

briankrznarich commented 1 year ago

I saw/read #69 before posting, because it indeed looks related. The difference I see is that 虫も殺さない、 虫も殺せない, 虫も殺せぬ、虫も殺さぬ are not inflected forms of each other. They are inflections of 虫も殺す, which is not idiomatic at all, so it can't serve as a "parent" entry for them.

背に腹は替えられぬ is currently entered as [exp][proverb].
背に腹はかえられない is currently entered as [exp,adj-i][proverb] (Jisho.org will actually conjugate/inflect this if you ask).

To merge them, it might be necessary to discard [adj-i].

虫も殺さぬ is listed as [exp,adj-pn][id]. I don't see a way to cleanly merge that with a theoretical additional surface form of 虫も殺さない.

I think the result of #69 was that they updated the "suffix inflection rule" for the entries so that they could be programmatically inflected(rather than explicitly adding inflected forms to the entries). The issue with ぬ/ない would actually seem to clobber that change.

JMdictProject commented 1 year ago

I'll respond to the issue Brian has raised, then maybe separately to a couple of other points he raised. In the past we attempted to keep fairly strict alignment of kanji forms and readings, so for this ない/ぬ/ず situation, we'd either have three distinct entries or have one entry and use restrictions in the reading field to force alignment. We have now relaxed the kanji/reading alignments a bit, and introduced the sK and sk tags which can reduce the baggage a bit,

To illustrate the possible approaches, I'll use 墓に布団は着せられぬ (2666530). If we look at the n-gram counts for partial variants, we get:

布団は着せられぬ - 88 - 34.6% 布団は着せられず - 140 - 55.1% <- not an entry 布団は着せられない - 26 - 10.2% <- not an entry 布団は着せられ無い - 0 - 0.0%

For this one I'd be inclined to do one of two things:

We do need to have an established approach to these.