JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Asking for clarification on adj-no; Can we add something to the Editorial Policy? #105

Open briankrznarich opened 8 months ago

briankrznarich commented 8 months ago

Having a crisis of faith one what [adj-no] is supposed to mean.

This began with stripping [adj-no] off 胎児 (fetus/fetal?). I made a long-winded comparison to 幼児(toddler), and the fact that the latter was not [adj-no], but has similar usage in Japanese. By contrast, eijiro gives:

https://eow.alc.co.jp/search?q=fetal 形: 胎児の

Obviously.

I was then told that [adj-no] is overly marked in jmdict db, and it should be removed in a lot of places. I've since stripped it off 5 or 6 entries. Most were unambiguous, but "fetal" still seems like an issue to me. If we can remove [adj-no] from there, then where does it belong?

The only guidance is: adj-no | nouns which may take the genitive case particle 'no'

That seems like almost all nouns. In the first week of JAPN101, simplistic though it might have been, we learned that "American car" was アメリカの車. Compare some ngrams:

アメ車 322281 アメリカ車 43188 アメリカの車 5352

American products: アメリカの商品 223890 American film: アメリカ映画 165804 American life/lifestyle: アメリカ生活 115271 American badger (Taxidea taxus)​ アメリカ穴熊 (pulled from jmdict) ...

We don't tag アメリカ as [adj-no] or [adj-f]. We have a mountain of アメリカ~ terms. Maybe it is [adj-f]?

On a related note, I was corrected on another term and told that [adj-f] is often [adj-no] with an elided の, and we don't generally [adj-f][adj-no] on one term, even where usage without の dominates.

病気 is probably the common example of [adj-no] (vs. adj-na). Is this the limited use we're aiming at? And how are these distinguished: 病気 = illness, so 病気の = sick. アメリカ = America, so アメリカの = American

I'm not lobbying to [adj-no] or [adj-f] all the countries (though maybe we should?). But Guidance in the Editorial Policy would be greatly appreciated. I'm not trying to be pedantically difficult, I really don't know what our objectives with [adj-no] are.

stephenmk commented 8 months ago

It seems you may have missed Robin's detailed reply from a couple weeks ago.

I don't know how other people browse jmdictdb, but I wrote my own script to display recent edits in threaded, collapsible boxes. It's not the prettiest, but it makes it a lot easier to catch a glimpse of new edits as they come in.

briankrznarich commented 8 months ago

Oh wow @stephenmk, I need to take a look at those scripts.

Yes I did miss that reply. Robin's answer was very informative. I think including something to that effect in the Editorial Policy would provide good guidance to others as well. It would be nice for jmdict to have some public-facing explanation of the goal with adj-no to avoid confusion.

"Shock" was an interesting example. sankoku actually has a note: 衝撃の(=衝撃的な), I've never seen this in a kokugo, so this seems quite a rare case indeed. If that's the bar, it's pretty high. Hunting around, it looks like を受ける and を与える are the most common ways to say something was shocking (also in sankoku). Maybe worth adding...

Anyway, I've been extremely delinquent on keeping up with my own edits, pretty much leaving them to others to decide on. I can't do anything but "poll" each and every one of them, and the more there are, the harder this is to do. I started to find this psychologically stressful, and stopped looking so often. But it's not good to be missing as much feedback and commentary as I do.

I've always assumed the main editors have alerts or something to make this easier. Perhaps not? If updates were automatic they'd be easier to keep up with. I suppose if you always look at everything (an impressive feat), this isn't an issue. I hadn't considered scripting something on my own. I may play with your scripts, and/or give it a go myself.

stephenmk commented 8 months ago

I've never seen this in a kokugo, so this seems quite a rare case indeed. If that's the bar, it's pretty high.

gg5 also has a couple relevant examples ("衝撃の事実 | a shocking fact", "衝撃の告白 | a shocking confession").

IMO the bar isn't as high as sankoku's explicit note for 衝撃, but typically the kokugos will have 〜の usage examples. Pulling an entry at random, I see that 勝手向き (adj-no, n) has "━の用品" in meikyo and "━の商品を扱う" in daijirin. And then of course there's always the "Top 10 N-grams Lookup" feature on the n-gram servers that you can look at.

briankrznarich commented 8 months ago

Thanks for the additional comments. I looked really hard at usage examples for 衝撃 before commenting to convince myself this was correct, so I saw 衝撃の事実 and 衝撃の告白 in particular. Definitely interesting. But there seems to be a very particular reason to do this (vs. 衝撃的な and 衝撃を与える) and this is something jmdict doesn't generally give guidance on, so I decided to let that be.

Your mention of 勝手向き suggests a whole different rabbit hole, as 向き and 向け are two of those grammatical affixes that usually behaves in a very adj-no way(facing, directed towards, with X in mind, etc...). I suspect that [n] doesn't belong on sense [1] at least. Hard to confirm at the moment.

But there is a bunch of inconsistency in the 向き entries, some of which is probably justified, some not. 南向き is currently [n]. Surely this is [adj-no], and maybe [n]. There are 48 ~向き entries. Only 13 are currently [adj-no]. That's probably not right. I may have a go at this I guess, but proving a negative (i.e. removing [n]) is much harder than justifying [adj-no]. 上向き is [n,adj-no], 下向き is [n]. hmm : )

The cardinal directions are all [n]. The translation of 南向き as "Southern Exposure" is certainly apt in a housing context. But I have always, and basically unconditionally, imagined it as [adj-no] "southern-facing". This has worked in any context I've ever encountered it, but the entry 家の向き certainly calls this into question.

I do rely heavily on the top-n ngrams as a starting point, but of course it's subjective after that. アメリカの is super-common, of course.

If you have an opinion on the cardinal directions +向き in this context, I'd be happy to know it. (adds clarity to [adj-no] rationale, so not totally off-topic)

stephenmk commented 8 months ago

This has worked in any context I've ever encountered it, but the entry 家の向き certainly calls this into question.

Sankoku, for example, has separate senses for 向き when used as a standalone noun and when used in a compound. It's the latter that is more often used as an adjective. So I think it makes sense that '家の向き' would be a noun.

If you have an opinion on the cardinal directions +向き in this context, I'd be happy to know it.

Daijrin's entry for 南向き uses a 〜の example.

南の方に向いていること。「━の部屋」

Saitō J-E (an old dictionary) has both 名 and 〜の glosses.

〈名〉A southern aspect:(=の) with a southern aspect; looking toward the south

Another trick I've seen editors use to gauge the noun usages of a word is to check particle attachment frequencies. The relatively low usages of 〜は, 〜が, and 〜を seem to suggest that "noun usages" are also infrequent.

Google N-gram Corpus Counts
╭─ーーーー─┬─────────╮
│ 南向き  │ 581,617 │
│ 南向きの │ 111,375 │
│ 南向きに │  16,132 │
│ 南向きな │   4,685 │
│ 南向きが │   3,549 │
│ 南向きは │   2,920 │
│ 南向きと │   2,442 │
│ 南向きを │   2,348 │
╰─ーーーー─┴─────────╯

And finally, a lot of the top terms for 南向き are adjectival even if they don't have 〜の: "南向き物件" = southern-facing property, "南向き住戸" = southern-facing dwelling, etc.

So my opinion is that it would be okay to change the 〈北/東/南/西〉向き entries to adj-no and gloss them as adjectives. I know recently we've been more aggressive about dropping noun tags from 形容動詞 and 副詞 that are rarely ever used as nouns (see 穏健, 鋭意, and お先真っ暗 for example). My hunch is that we can do the same for adj-no words, but maybe one of the editors can confirm.

briankrznarich commented 8 months ago

Is there something you all use to make that nicely formatted ascii ngram table in particular?

Thank you for the detailed reply. Given your comments, I think we pretty much see eye-to-eye on this. In other words, your answer is what I hoped to hear.

Googling "南向きを" usage, these do exist. But, if you imagine "南向き" as one of 4 options you choose when searching for an apartment, "南向き" is not so much a standalone noun, but "the choice 南向き". i.e. "I chose the 南向き option from the list". 南向きを often even appears as "「南向き」を" in google results. So maybe dropping [n] entirely is justifiable. (vs [adj-no,n]).

You probably already know this, but for anyone else who might happen by, just a warning on the particle-attachment trick. I am someone who does this from time to time. You have to be particularly careful with が because of the けど sense, and な because of the overlap with なので、など,  なら, ながら、etc. (especially relevant when evaluating [adj-no]/[adj-na]). が seems legit here, but

南向きな 4685   南向きなので 3182 南向きなどの 1268 南向きなら 427

In the past I'm sure I've erroneously claimed something was [adj-na] just looking at that simple な count.

の is even more complex of course, because we now evaluate entirely on context.

stephenmk commented 8 months ago

Is there something you all use to make that nicely formatted ascii ngram table in particular?

It's another one of my scripts which adds some extra button controls to the results page. Opencooper also shared a different script which does many of the same things.

You probably already know this, but for anyone else who might happen by, just a warning on the particle-attachment trick.

Yes, thank you for mentioning this. The parsing system used by the n-gram counter can return misleading results in many situations. Always best to use it as supporting evidence in combination with other sources if possible.