UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

The purpose of adding prepositions to the relation name #566

Closed msklvsk closed 5 years ago

msklvsk commented 6 years ago

In Enhanced Dependencies, null nodes, propagation of conjuncts, xcomp’s subjects, :relcl coreference — all bring new information to the table and cannot be reliably generated without a human involved. Preposition marking, however, seems to only duplicate the existing information. If your application relies on case marking encoded in arrows, it is more suitable to copy the info into arrows as a preprocessing step rather than commit this to the gold standard. Or am I missing something and it isn’t just a cache?

msklvsk commented 6 years ago

From workgroups/enhanced.html. Guy Perrier:

Joakim (item 2) and Marie (last item) propose to include repackaging information for downstream applications in the objectives of Enhanced UD. On the contrary, I think Enhanced UD must be independent of downstream applications. In my opinion, Enhanced UD is an abstraction of the basic dependencies that tends towards semantics at the maximum while remaining within the framework of the syntax. For instance, subtyping obl and advcl relations with adpositions and subordinate conjunctions has nothing to do with Enhanced UD. Moreover, this makes annotations less readable and the set of relations more complex and dependent on particular languages.

gossebouma commented 6 years ago

It is not completely mechanical, though. In a recent paper (submitted for TLT) I compared a language specific enhancer for Dutch (AE) with @sebschu 's language independent enhancer (SE), and noted that:

The dependencies acl and advcl are extended with the lemma of their case or mark dependent. However, advcl and acl clauses can contain both, as in een zondag om nooit te vergeten (a sunday to never forget). AE adds the case lemma om in these cases, where SE adds the marker te. This explains the almost complementary distribution of acl:om + advcl:om and acl:te + advcl:te. A similar issue arises with conjunctions containing two cc elements, such as zowel op straat als in woningen (both on the street and in houses), where AE adds the lemma of the first cc to the conj dependency and SE the lemma of the second. Finally, some prepositional phrases consist of a preposition as well as a postpositional particle (op twee na (except two)). Again, the two methods give different results for the lexical extension.

With some work and more explicit guidelines, one could easily reach agreement. In the first case, the marker te is just verbal inflection and is not very informative, but the marker om could be useful for semantic interpretation (marking a purpose clause). In the other cases, a concatenation of case (+ particle) might be most informative.

dan-zeman commented 5 years ago

I just checked that the complex case with multiple tokens, mentioned by @gossebouma, is now documented in the guidelines. So I think this issue can be closed.

I share the doubts about the usefulness of this enhancement type but it is a part of the current guidelines, so it is allowed to appear in the UD treebanks.