Closed msklvsk closed 5 years ago
From workgroups/enhanced.html. Guy Perrier:
Joakim (item 2) and Marie (last item) propose to include repackaging information for downstream applications in the objectives of Enhanced UD. On the contrary, I think Enhanced UD must be independent of downstream applications. In my opinion, Enhanced UD is an abstraction of the basic dependencies that tends towards semantics at the maximum while remaining within the framework of the syntax. For instance, subtyping obl and advcl relations with adpositions and subordinate conjunctions has nothing to do with Enhanced UD. Moreover, this makes annotations less readable and the set of relations more complex and dependent on particular languages.
It is not completely mechanical, though. In a recent paper (submitted for TLT) I compared a language specific enhancer for Dutch (AE) with @sebschu 's language independent enhancer (SE), and noted that:
The dependencies acl and advcl are extended with the lemma of their case or mark dependent. However, advcl and acl clauses can contain both, as in een zondag om nooit te vergeten (a sunday to never forget). AE adds the case lemma om in these cases, where SE adds the marker te. This explains the almost complementary distribution of acl:om + advcl:om and acl:te + advcl:te. A similar issue arises with conjunctions containing two cc elements, such as zowel op straat als in woningen (both on the street and in houses), where AE adds the lemma of the first cc to the conj dependency and SE the lemma of the second. Finally, some prepositional phrases consist of a preposition as well as a postpositional particle (op twee na (except two)). Again, the two methods give different results for the lexical extension.
With some work and more explicit guidelines, one could easily reach agreement. In the first case, the marker te is just verbal inflection and is not very informative, but the marker om could be useful for semantic interpretation (marking a purpose clause). In the other cases, a concatenation of case (+ particle) might be most informative.
I just checked that the complex case with multiple tokens, mentioned by @gossebouma, is now documented in the guidelines. So I think this issue can be closed.
I share the doubts about the usefulness of this enhancement type but it is a part of the current guidelines, so it is allowed to appear in the UD treebanks.
In Enhanced Dependencies, null nodes, propagation of conjuncts,
xcomp
’s subjects,:relcl
coreference — all bring new information to the table and cannot be reliably generated without a human involved. Preposition marking, however, seems to only duplicate the existing information. If your application relies on case marking encoded in arrows, it is more suitable to copy the info into arrows as a preprocessing step rather than commit this to the gold standard. Or am I missing something and it isn’t just a cache?