Closed nschneid closed 2 years ago
Would like to go with enhanced relations nsubj:depict
(with ADJ) and nsubj:pass:depict
(with VBN) for now so we don't lose the secondary predication. @dan-zeman can these be enabled in the validator?
In the above commit I've switched to xcomp
for:
I think they're more like complements than adjuncts.
The guidelines amendment 4 says that "precise naming recommendation for the enhanced edge is deferred for further discussion." Should we put it on the agenda for the core group in November?
Perhaps we should also create a corresponding issue in the docs repository. Once the naming recommendation is decided, we need to also put it in the enhanced UD guidelines. Either as part of the current enhancement "Controlled/raised subjects" (which might then have to be renamed), or as a new, seventh enhancement type.
OK if you think it needs discussion I'll remove the subtype for now. The above commit can be referenced later to restore them.
OK if you think it needs discussion I'll remove the subtype for now. The above commit can be referenced later to restore them.
Yeah, it may need discussion, remember how long it took to decide :outer
:-) I'd personally be okay-ish with :depict
— unless we actually use :xsubj
for this, too?
Either :xsubj
means a subject implied by xcomp
specifically, or it could mean any inferred subject (relative clause subjects too?). Could be less confusing to distinguish different kinds of inferred subjects (due to xcomp, depictives, controlled adjuncts). The risk of trying to distinguish them is we would end up with a lot of different labels. But I would lean toward splitting the different subtypes at first, and then it would be easy to merge them later if necessary.
A related question is how much we want to encourage edeprels to be (semi)manual, versus completely automated. Moving beyond conj/relcl/xcomp-generated ones would create a greater need for manual disambiguation, I would think. Not sure how many treebanks are investing effort in this.
I would be against introducing any new subtypes at this point. We have an inflation of rare subtypes already and I also don't see realistic prospects for wide-scale manual annotation of edeps currently. I think we need to concentrate on consolidation rather than introduction of new distinctions before we've shored up the distinctions we just introduced across more than a handful of datasets.
@amir-zeldes you'd prefer to overload :xsubj
? I don't have a strong opinion TBH.
Yes - I think for datasets that have xsubj there is a good chance that these are already xsubj in other languages as well, so this will promote the most comparability and avoid a very rare deprel as well, which we also don't have a good way of identifying automatically.
The policy for depictives has changed. A couple of heuristic queries for tokens to review:
acl
edges where the dependent is an ADJ and nomark
acl
edges where the dependent is a participle