Open nschneid opened 6 months ago
As this is a trivial change to implement, but one that multiple treebanks may want to make in concert, is it better to update EWT/GUM before May 1 or wait until the next release?
I'm not the right @ for LinES, but I can do it in the CoreNLP converter, PUD, and Pronouns
@LarsAhrenberg I can do it if you want me to do it to LinES
Is this just literally a string replace over everything?
The only : relations marked in Pronouns are aux:pass
and det:predet
. Another job well done
PUD has plenty. Please confirm if there's any intelligence required to do this, or just ESC-shift-5
Simple replacement. Since EWT lacks any entity annotation whatsoever, for the :tmod
ones I think I'll add TemporalNPAdjunct=Yes
in MISC to retain the semantic information for posterity. Eventually we should annotate all temporal entities.
is it better to update EWT/GUM before May 1 or wait until the next release?
Not sure, time is a bit tight. And it's not just English, where I can update the GUM, Reddit and GENTLE repos - I know of at least UD Coptic and Hebrew IAHLTwiki which I maintain and use these labels, so I could change those, but I haven't coordinated with the annotators about this. Do you know if there are other datasets using these subtypes? I wouldn't want to create differences between datasets on short notice just for a renaming.
OK let's not rush it then. Let's implement it in the 2.15 release.
For Ancient Hebrew the usage of obl:npmod
isn't "preposition-less non-temporal obl
" but rather the construction argued about in #832, so I'd need a new label for those if there is to be an effort to eliminate :npmod
in general.
@mr-martian I think obl:unmarked
is about as informative/appropriate as obl:npmod
, so you may as well switch too (not saying it's an ideal label, but the previous one also makes no sense in the context of dependencies)
I started to draft a new issue about this, forgetting that this one existed. :D One bit of information not included above is the alternatives that were discussed, which I'll put for posterity:
advmod
/advcl
, plus UD regards adverbial PPs as nominals so the lack of the preposition doesn't distinguish adverbial nmod
s or obl
s from non-adverbial ones).Implemented for EWT, and created some initial docs:
Still need to update more docs pages and mark old subtypes as deprecated.
What are implementation plans for other treebanks?
So far UD_English-LinES has used neither :npmod nor :tmod, but it seems quite straightforward to implement :unmarked so I put it up for version 2.15.
I made a PR for PUD. I don't think it's relevant for Pronouns
Reviewing the outputs of my script adding :unmarked to obl and nmod tokens I've come across a number of cases where I think the subrelation is reasonable but which are not covered in the initial docs ( oblique, nmod ). I would be grateful to hear the views of other people.
Multipart references to locations at number four, Privet Drive nmod:unmarked(four, Privet)
by way of Northfield , Minnesota nmod:unmarked(Northfield, Minnesota)
Apposition like but without identity of reference: blamed for letting the quality of life (a deplorable phrase) deteriorate nmod:unmarked(quality, phrase)
Subject: The cost of enlargement nmod:unmarked(Subject, cost)
Your amendments uphold two important principles: the right of rightholders to fair remuneration and the ... nmod:unmarked(principles, right)
Personal pronoun + noun I suppose you fellows remember... nmod:unmarked(you, fellows)
Go back to Stromboli, you dumb bastard nmod:unmarked(you, bastard)
Multi-word proper noun made adjective a tall Puerto Rican man. nmod:unmarked(man, Puerto), flat(Puerto, Rican)
Pre-head modifier like 'a couple' leather red with a suppleness to it that is part gift, part effort nmod:unmarked(gift, part), nmod:unmarked(effort, part)
Fronted or extraposed subject predicative A kibbutznik seaman, he has just returned from a voyage. obl:unmarked(returned, seaman)
These grew spontaneously one out of the other, obl:unmarked(grew, one)
Sound imitations Pop, would go one of the eight-inch guns; obl:unmarked(go, Pop) or maybe it should be obj(go, Pop)
Sound imitations
Pop, would go one of the eight-inch guns; obl:unmarked(go, Pop) or maybe it should be obj(go, Pop)
"Pop" can't be omitted so it looks like obj
to me (with an inverted word order; cf. 'Never!' said John).
Pre-head modifier like 'a couple' leather red with a suppleness to it that is part gift, part effort nmod:unmarked(gift, part), nmod:unmarked(effort, part)
Interesting...haven't thought about this one:
nmod:unmarked
probably makes sense by analogy to "a couple".Multi-word proper noun made adjective a tall Puerto Rican man. nmod:unmarked(man, Puerto), flat(Puerto, Rican)
Because you can say "the man is Puerto Rican", I would lean toward treating the whole expression as an ADJ (ExtPos=ADJ). Thus: flat(Puerto/PROPN,ExtPos=ADJ Rican/ADJ) and amod(man, Puerto)
The rest have been discussed but not decided yet. See this paper for a synopsis and some proposals. If you want to contribute to the discussion: #455, UniversalDependencies/UD_English-EWT/issues/436, #751, #762, #933, #1024
OK, this change should now be done and documented for:
Excellent!
Any updates regarding English-Atis (@aslikuzgun), English-ESLSpok (@kristopherkyle), English-ParTUT (@msang)? All of these use at least a subset of the {nmod:npmod, obl:npmod, nmod:tmod, obl:tmod}
relations.
I believe the English docs are now up to date, with mentions of :npmod
and :tmod
replaced with :unmarked
.
I have not heard any objections to incorporating :unmarked
into the remaining English corpora. @dan-zeman what is the policy regarding simple rule-based edits to other treebanks in the interest of within-language consistency?
I have not heard any objections to incorporating
:unmarked
into the remaining English corpora. @dan-zeman what is the policy regarding simple rule-based edits to other treebanks in the interest of within-language consistency?
It depends. If I know that a treebank is actively maintained (or was in the not-so-distant past), like EWT, I would hesitate to touch it without the current maintainer's consent. If I know that the data provider / last maintainer has been silent for a long time, I would just go and fix it. Ideally the validator should flag it as a new error and the treebanks should get their four years grace period. But we currently have this mechanism only for the main guidelines, not for the language-specific relation subtypes.
Is there a reason to keep this issue open or has everything been resolved?
I think it's still open for Atis, ESLSpok, ParTut.
Hi, just for the record, the latest release of ParTUT includes this change
@amir-zeldes I just discovered in GUM a few stray enhanced edges with :tmod
: https://universal.grew.fr/?custom=673bf3feccc5f
oh wow, whoops! Thanks for that, I'll clean them up upstream
Because prepositions are so important in English, we have a well-established practice of distinguishing ordinary prepositional
nmod
andobl
from other kinds via subtyping (nmod:poss
, etc.).In particular,
nmod:tmod
/obl:tmod
have been used for non-prepositional temporal adjunct nominals likeobl:tmod
) The party Friday was widely attended. (nmod:tmod
)in contrast to
obl
) The party on Friday was widely attended. (nmod
)tmod
is part of the legacy of Stanford Dependencies. In light of current UD theory, it is an anomaly where the subtype reflects a semantic but not syntactic distinction (#893). Moreover, it is potentially confusing that only some temporal obliques (the prepositionless ones) receive the subtype.Meanwhile,
nmod:npmod
/obl:npmod
are used for OTHER non-prepositional adjunct nominals (in special constructions like "5 dollars a share" and "Shares eased a fraction). The term "npmod" (derived from thenpadvmod
relation in Stanford Dependencies) has been a source of confusion and invokes a concept of NP that is not part of UD theory.A discussion amongst the core group concluded that a subtype named
:unmarked
would be a less confusing way to implement the adpositional vs. non-adpositional distinction, for languages that choose to do so.@amir-zeldes and I plan to implement this for our English corpora, by simply renaming both
:tmod
and:npmod
to:unmarked
. Perhaps English-Atis (@aslikuzgun), English-ESLSpok (@kristopherkyle), English-{LinES, Pronouns, PUD} (@AngledLuffa), English-ParTUT (@msang) would like to do so as well for consistency.