UniversalDependencies / UD_German-GSD

Other
18 stars 5 forks source link

Pronominal adverbs #33

Open cinkova opened 1 year ago

cinkova commented 1 year ago

I wonder about the policy of tagging the pronominal adverbs (Pronominaladverbien), e. g. dabei, damit, wozu, hierauf. Depending on the preposition lemma, they tend to be tagged as either PRON or ADV. ADV is rather strange, since they usually substitute a noun, just like personal pronouns. I am wondering because of a similar phenomenon being tagged as ADP in Irish, which is super weird. I have even found some ADP that are not leaves (Irish, still).
Note that the German words I mention are not the "genuine" pronominal adverbs such as woher!

dan-zeman commented 1 year ago

The general policy for pronominal adverbs in UD is that their UPOS is ADV and they have a non-empty PronType (e.g. Int, Dem, Tot, Neg). However, the assumed examples there are typically the "genuine" pronominal adverbs ("where", "when", "there", "never" etc.)

I think that the traditional approach in German tagging (even before UD) is that words like damit, wozu are adverbs or pronominal adverbs (PROAV in the Stuttgart-Tübingen tagset). I understand that they substitute a nominal, but one with preposition — it is actually not unusual cross-linguistically that a prepositional phrase has an adverbial meaning, and if fused into one word, becomes an adverb. Moreover, in the specific German case, if we view dabei as a contraction of "bei da" it is actually a preposition (bei) with an adverb (da), so the head is adverb, so it is even more natural for the whole thing to be adverb than it would be if the head were a pronoun. It would be actually possible to treat them as multiword tokens in UD and split them to two syntactic words, but I'm not sure it would help much.

To summarize, I believe that the policy should be that these words are ADV (definitely not ADP! but also not PRON). Unfortunately, I'm afraid that this wasn't the policy when the pre-UD annotation of the GSD treebank was created, or the policy was not consistently followed. ADV seems to be the prevailing solution (http://hdl.handle.net/11346/PMLTQ-9XIU) but there are other tags like PRON, ADP, SCONJ or CCONJ. This should be fixed. The other German treebanks also prefer ADV and seem to be more consistent than GSD but they, too, have tagging errors.