UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
200 stars 42 forks source link

Overuse of INTJ #429

Closed nschneid closed 10 months ago

nschneid commented 1 year ago

Per https://universaldependencies.org/u/pos/INTJ.html, INTJ should not be used for words that come from another category, like adjectives/adverbs.

https://universal.grew.fr/?custom=650f87d83dd5e - includes "good", "great", "fine", "well", "Christ", ...

amir-zeldes commented 1 year ago

Agreed about good, great, fine; I think "well" is established as a discoursy interjection in the sense that doesn't mean 'good' in any way (sentence initial well, xpos UH), I would keep that INTJ (not the same lexical item). For "Christ" we get into the general area of profanities, which often do not behave like their morphological POS - I'm not sure we want them not to be INTJ just because of etymology. They are definitely UH in xpos.

nschneid commented 1 year ago

I think "well" is established as a discoursy interjection in the sense that doesn't mean 'good' in any way (sentence initial well, xpos UH)

Agreed, I wasn't thinking that one though

For "Christ" we get into the general area of profanities, which often do not behave like their morphological POS - I'm not sure we want them not to be INTJ just because of etymology.

https://universaldependencies.org/u/pos/INTJ.html specifically rules out "God". Not to sound religious but I don't think we can distinguish "God" from "Christ".

amir-zeldes commented 1 year ago

Hm, it's hard with that being in the guidelines the way it is, but do we think that's true for swearwords as well? I mean, we could argue about what POS some of them are but it would lead to a colorful GH issue ;) If we stick to PTB UH for them, which I think is standard, we would need a conversion table to know what their 'etymological POS' is, and I'm not sure how much sense that makes. Morphosyntactically, profanities and other oathes do fit the definition of emotional, syntactically unintegrated language...

nschneid commented 1 year ago

There may be some borderline cases but here's an interpretation that makes sense to me: If a word is mainly used for swearing, and it's syntactically extrinsic to the semantics-bearing part of the sentence (not a predicate or argument etc.), then it's INTJ. Same for discourse particles from which verbs have been derived (INTJ for the main use of "OK" even though it can also be a VERB), and discourse particles whose meaning is quite distinct from the non-discourse one ("well", "like" as INTJ). If it's a word that is mainly an ADV or NOUN etc. and also has a secondary use as a discourse particle, then it's not INTJ. We can tell that it is being used as a discourse particle because it attaches as discourse, but we don't need to posit a separate lexical entry.

"Please", "sorry", "right" feel kinda borderline. I guess "Sorry/discourse" just expresses that you're sorry/ADJ so we can call it ADJ. "Right?" is like saying "Is that right/ADJ?", so ADJ as well. "Please/discourse" is farther from evoking an act of pleasing, so I'd call it INTJ.

amir-zeldes commented 1 year ago

I have to say it's not that I necessarily think the above is a bad way to slice up the space, but it seems like 'just another arbitrary system', where we already have one (PTB UH). I would probably be happier just deciding "if it's UH, it's INTJ" since that's an established practice in English and be done with it. Having/maintaining another lexicalized list of items just for upos seems rather unappealing... We could just decide that all of those swear and discourse items have a second sense/lexical entry which deserves tagging as INTJ - if it's good enough for "well" and "ok" then why not also for the gander?

nschneid commented 1 year ago

Well, it's a question of how closely we want to follow the INTJ guidelines. If other languages follow them and exclude "God" etc. then we risk having English be incompatible and making crosslinguistic comparison harder. TBF this is a pretty small set of lexical items in practice (that would be UH but not INTJ).

(Incidentally I don't see a mention of "God" in the PTB guidelines—are we sure the annotators consistently tag it UH when it's vocative?) image

amir-zeldes commented 1 year ago

For "[Oo]h [Gg]od", ON has 50 UH : 5 NNP, and three of the latter are actually referring to God ("Oh God, we ask you for...")

I get the issue with crosslinguistic comparison, but I feel like language internal consistency should also not be overlooked, and I worry about chaos/arbitrary decisions and incompatibility between TBs. I think ultimately people will decide on a language-specific basis whether there is a separate lexical item for an intj version of something. For example, some UD Russian TBs tag the archaic "боже" (vocative of "God") as an INTJ and lemmatize it to itself (UD_Russian-Taiga). But others still consider it to be a form of lemma God, and annotate it as a noun with vocative case (even though modern Russian has no productive vocative), for example in UD_Russian-SynTagRus, with the nominative lemma бог "God".

So at the end of the day, if Russian can choose for there to be a 'special' interjection use of God, I think it's likely other languages will vary too, and I'm not sure that's wrong (though I am sure it shouldn't oscillate within the same UD language...)

nschneid commented 1 year ago

I agree that flexibility in the universal guidelines is sometimes necessary. If we want to say that it should be up to the language, then the guidelines shouldn't articulate a hard-and-fast rule. @dan-zeman do you think this calls for a more flexible guideline? (Come to think of it, why does https://universaldependencies.org/u/pos/INTJ.html say "God" is a NOUN and not a PROPN?)

amir-zeldes commented 1 year ago

"God" is a NOUN and not a PROPN

In PTB non-INTJ usage, it's determined just by captialization it seems... I suppose both tags are possible, certainly for common noun uses ("a/some god").

dan-zeman commented 1 year ago

I believe we need INTJ for words that cannot be anything else. A vocative use of god is just a vocative use of a noun. (Why do you think it should be PROPN?) It is irrelevant whether the speaker actually intends to talk with the god or is simply swearing. First, I'm not sure I'd want (and be able) to distinguish those two cases. And second, those differences are pragmatic, but syntactically it's still a vocative. Same for Russian боже - yes, the vocative morphology is no longer productive, but it only means that for most other nouns the nominative form is used where vocative would be appropriate; syntactically it is still a noun and it is annotated with Case=Voc in SynTagRus.

amir-zeldes commented 1 year ago

Why do you think it should be PROPN?

In the use as a referring expression? Because it is capitalized, refers to a unique individual (at least in the sense of Judeo-Christian God), and appears without an article, like other names. If it's just one of many (usually lowercase) gods, "the/a god" etc., then it should indeed be NOUN IMO.

it only means that for most other nouns the nominative form is used where vocative would be appropriate

This would be true if the language had lost vocative for most nouns, but had a stable class of nouns that still clearly distinguish vocative. I don't think that's true for Russian - outside of this lexicalized exclamative use, even if you are talking to God, there is no special form, for example I just found this on Pinterest:

I think it's just in a specific type of exclamation, often preceded by "my".

it is annotated with Case=Voc in SynTagRus

Right, but not in Taiga for example, so there is some variation in how annotators perceive it. That's part of my point - that language internal guidelines will often have to decide if there is a separate INTJ item that happens to look like a NOUN etc. For English, for example, exclamative God alternates with the euphamistic non-NOUN "gosh", and you can say "oh my gosh!", but you can't say "Dear Gosh, hear my prayer", so this is another indication that the exclamative God is perhaps a different lexical item.

dan-zeman commented 1 year ago

OK, then the Russians should really make up their minds whether the frozen vocative боже can still be analyzed as a noun; I cannot judge its frequency (when I speak Russian at all, it's not to a god) and I'm probably biased, being a speaker of a language where the vocative is still productive and bože sounds absolutely normal.

In English, I understand that gosh is probably an interjection. But I don't think its an argument for god/God to not be a noun (common or proper).

amir-zeldes commented 1 year ago

In English, I understand that gosh is probably an interjection. But I don't think its an argument for god/God to not be a noun (common or proper).

Well, I'm saying it could be used as an argument for there being two lexical items "god", one of which is a noun (and is not in a paradigm with gosh), and one of which has the same POS as "gosh", with which it is completely interchangeable paradigmatically.

nschneid commented 10 months ago

Per today's Core Group discussion, the wording on the INTJ page was probably a bit too specific; the intention was to emphasize the general guidelines about prototypical vs. productively extended usages. Revised to make this clearer: https://universaldependencies.org/u/pos/INTJ.html

jnivre commented 10 months ago

Perfect. Thanks, Nathan.

Joakim

Skickat från Outlook för iOShttps://aka.ms/o0ukef


Från: Nathan Schneider @.> Skickat: Wednesday, December 13, 2023 4:52:21 AM Till: UniversalDependencies/UD_English-EWT @.> Kopia: Subscribed @.***> Ämne: Re: [UniversalDependencies/UD_English-EWT] Overuse of INTJ (Issue #429)

Per today's Core Group discussion, the wording on the INTJ page was probably a bit too specific; the intention was to emphasize the general guidelines about prototypical vs. productively extended usages. Revised to make this clearer: https://universaldependencies.org/u/pos/INTJ.html

— Reply to this email directly, view it on GitHubhttps://github.com/UniversalDependencies/UD_English-EWT/issues/429#issuecomment-1853224557, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ7ZVRPEZII5M4IRRZDJPLYJEQ7LAVCNFSM6AAAAAA5ERWSQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGIZDINJVG4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

amir-zeldes commented 10 months ago

LGTM, thanks!

Amir

From: Joakim Nivre @.> Sent: Wednesday, December 13, 2023 2:38 AM To: UniversalDependencies/UD_English-EWT @.> Cc: Amir Zeldes @.>; Comment @.> Subject: Re: [UniversalDependencies/UD_English-EWT] Overuse of INTJ (Issue #429)

Perfect. Thanks, Nathan.

Joakim

Skickat från Outlook för iOShttps://aka.ms/o0ukef


Från: Nathan Schneider @.> Skickat: Wednesday, December 13, 2023 4:52:21 AM Till: UniversalDependencies/UD_English-EWT @.> Kopia: Subscribed @.***> Ämne: Re: [UniversalDependencies/UD_English-EWT] Overuse of INTJ (Issue #429)

Per today's Core Group discussion, the wording on the INTJ page was probably a bit too specific; the intention was to emphasize the general guidelines about prototypical vs. productively extended usages. Revised to make this clearer: https://universaldependencies.org/u/pos/INTJ.html

— Reply to this email directly, view it on GitHubhttps://github.com/UniversalDependencies/UD_English-EWT/issues/429#issuecomment-1853224557, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ7ZVRPEZII5M4IRRZDJPLYJEQ7LAVCNFSM6AAAAAA5ERWSQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGIZDINJVG4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/429#issuecomment-1853406208 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQWKW2UBIGTZTMV7A3N7RTYJFLL7AVCNFSM6AAAAAA5ERWSQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGQYDMMRQHA . You are receiving this because you commented.Message ID: @.***>