UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Implement new guidelines for optional depictives #321

Closed nschneid closed 2 years ago

nschneid commented 2 years ago

The policy for depictives has changed. A couple of heuristic queries for tokens to review:

nschneid commented 2 years ago

Would like to go with enhanced relations nsubj:depict (with ADJ) and nsubj:pass:depict (with VBN) for now so we don't lose the secondary predication. @dan-zeman can these be enabled in the validator?

nschneid commented 2 years ago

In the above commit I've switched to xcomp for:

I think they're more like complements than adjuncts.

dan-zeman commented 2 years ago

The guidelines amendment 4 says that "precise naming recommendation for the enhanced edge is deferred for further discussion." Should we put it on the agenda for the core group in November?

Perhaps we should also create a corresponding issue in the docs repository. Once the naming recommendation is decided, we need to also put it in the enhanced UD guidelines. Either as part of the current enhancement "Controlled/raised subjects" (which might then have to be renamed), or as a new, seventh enhancement type.

nschneid commented 2 years ago

OK if you think it needs discussion I'll remove the subtype for now. The above commit can be referenced later to restore them.

dan-zeman commented 2 years ago

OK if you think it needs discussion I'll remove the subtype for now. The above commit can be referenced later to restore them.

Yeah, it may need discussion, remember how long it took to decide :outer :-) I'd personally be okay-ish with :depict — unless we actually use :xsubj for this, too?

nschneid commented 2 years ago

Either :xsubj means a subject implied by xcomp specifically, or it could mean any inferred subject (relative clause subjects too?). Could be less confusing to distinguish different kinds of inferred subjects (due to xcomp, depictives, controlled adjuncts). The risk of trying to distinguish them is we would end up with a lot of different labels. But I would lean toward splitting the different subtypes at first, and then it would be easy to merge them later if necessary.

A related question is how much we want to encourage edeprels to be (semi)manual, versus completely automated. Moving beyond conj/relcl/xcomp-generated ones would create a greater need for manual disambiguation, I would think. Not sure how many treebanks are investing effort in this.

amir-zeldes commented 2 years ago

I would be against introducing any new subtypes at this point. We have an inflation of rare subtypes already and I also don't see realistic prospects for wide-scale manual annotation of edeps currently. I think we need to concentrate on consolidation rather than introduction of new distinctions before we've shored up the distinctions we just introduced across more than a handful of datasets.

nschneid commented 2 years ago

@amir-zeldes you'd prefer to overload :xsubj? I don't have a strong opinion TBH.

amir-zeldes commented 2 years ago

Yes - I think for datasets that have xsubj there is a good chance that these are already xsubj in other languages as well, so this will promote the most comparability and avoid a very rare deprel as well, which we also don't have a good way of identifying automatically.