UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
270 stars 245 forks source link

Conflicting guidelines: verb particles in Germanic languages #771

Closed nschneid closed 1 year ago

nschneid commented 3 years ago

ADV claims Germanic verb particles are ADV (and not PART).

ADP claims they are ADP.

Which is correct? ADV corresponds to the more traditional English grammar view, but arguments have been made in favor of grouping these with transitive prepositions.

For English, EWT and GUM generally use ADP with compound:prt, though there are many incorrect uses of advmod.

LarsAhrenberg commented 3 years ago

I think the claim is that adverbs are ADV also when used as verb particles and the same is true of adpositions when used verb particles. The problem is that some words can be either, depending on the context (She fell down vs, She walked down the alley). Using 'in' as an example for both ADV and ADP as in the pages you are referring to may of course be confusing. For Swedish I tend to use the most common part-of-speech if context does not help. Thus, 'inne' (indicating a state of being inside something) is an ADV, while 'i' (in, into, usually indicating direction) is an ADP when used as verb particles.

dan-zeman commented 3 years ago

Yes, both are correct, depending on which word it is.

nschneid commented 3 years ago

For English, would "together", "apart", and "away" as verb particles qualify as ADV or ADP? They are in some ways similar to prepositions (can be modified by right, for example) but are always intransitive.

dan-zeman commented 3 years ago

I would say ADV but I leave the decision to native English speakers.

amir-zeldes commented 3 years ago

I don't really care about this deeply, since they are unambiguous in xpos (RP), but I also think it should be ADV, across the board actually, because:

  1. 'intransitive preposition' sounds like an oxymoron to me (I know CGEL uses this term, but for me "adposition" means it has an argument, which it stands next to)
  2. Some words in this class aren't homonyms of prepositions at all (e.g. @nschneid 's examples "away", "apart")
  3. They are interrogable using WH adverbs - "where did it go?" "away"
  4. They can coexist with PP arguments with the same word form as head ("pull out everything out of Lexis Nexis"), which suggests that they are not the same
  5. They are etymologically adverbs (this isn't really a strong argument, but if anything, the Germanic adpositions are derived from adverbs, not the other way around)
  6. They are in the R* series in the PTB tagset (RB, RP - not a subtype of IN)

I don't know Swedish, but for German I think they would also be best tagged as ADV, since there is even more of a tendency to distinguish the forms than in English (particle "(r)ein" vs. prep. "in"). But there too they have a special xpos (PTKVZ), so it would be easy to decide either way or even change our minds later if needed.

sylvainkahane commented 3 years ago

We know, from Tesnière and many others, that ADP are transitive ADV and ADV are intransitive ADP. Separating ADV and ADP is like separating intransitive and transitive verbs. Just to say that it is not very important to decide whether we add them in ADV or ADP. But if we have the two classes, I think that as soon as an item as a transitive construction it should be ADP even if it used intransitively somewhere (I mean as an adverb).

gossebouma commented 3 years ago

In the Dutch data compound:prt elements are predominantly ADP and ADV, but other pos occur as well, for instance in nl_lassysmall:

items   ud:upos
5 906 71.1% ADP
893 10.8% ADV
781 9.4% ADJ
503 6.1% NOUN
219 2.6% VERB
3 0.0% PRON

The pos-tag is projected from the underlying annotation.

Stormur commented 3 years ago

I think the etymological argument makes sense here. Most of these particles are originally adverbs and they still work that way, but then in a predictable way many become "specialsed" as prepositions, so now we have the same words (in this case, same lemmatisation) with two functions: ADV and ADP, where the second one is "deviant".

Since this seems to be "universal" (by the way, it is everyday stuff in Latin, too), given that UD can distinguish the two levels of part of speech and syntactic relation, how radical would it be to allow ADP to be assimilated into ADV, or equivalently to have a single common class, and then to let the deprel advmod/case distinguish the use? In a sense, letting ADV depend as case. I think this would be very spot-on especially with phrases such as down the alley and similar. It would capture the ambiguity & avoid discriminating two classes for essentially the same word. As it is now, advmod "forcing" ADV and case ADP (?) presents some redundancy.

Stormur commented 3 years ago

We know, from Tesnière and many others, that ADP are transitive ADV and ADV are intransitive ADP. Separating ADV and ADP is like separating intransitive and transitive verbs. Just to say that it is not very important to decide whether we add them in ADV or ADP. But if we have the two classes, I think that as soon as an item as a transitive construction it should be ADP even if it used intransitively somewhere (I mean as an adverb).

Could the same argument not be used also to prefer ADV? Are there other reasons to prefer ADP? Essentially, this definition also acknowledges a unity between the two classes, if I am not mistaken.

nschneid commented 3 years ago

In general, adpositions and particles are closed-class while adverbs are not. So if the principle is, when in doubt minimize the number of distinct tags per word type, ADP would be the better choice for most particles (at least in English).

If it is more important to stick to traditional grammatical terminology, ADV would be the better choice for particles.

If sticking with the annotated status quo is most important, EWT and GUM argue for ADP.

amir-zeldes commented 3 years ago

I think the etymological argument makes sense here. Most of these particles are originally adverbs

@Stormur - for Germanic I agree, but I think we should keep in mind that in other languages prepositions are not derived from adverbs (e.g. Afro-Asiatic), so this doesn't need to be extended universally.

amir-zeldes commented 3 years ago

minimize the number of distinct tags per word type

Mm, maybe when in doubt, but here I think cross-linguistic considerations should also play a role: it would be odd to decide that English cases like particle "in" are ADP (since it is string identical to a preposition) but German "ein" is ADV (since the corresponding preposition is "in" in German too)

EWT and GUM argue for ADP

Well, yes, GUM copies EWT here, since we don't really manually annotate UPOS anyway. I've always found this odd, and would have no problem switching, but I agree that EWT and GUM should match. So if EWT sticks to ADP, so will GUM!

More generally, I think less that phrasal verb adverbs are intransitive prepositions, and more that (most Germanic) prepositions are transitive ADV, if anything. Since we already have an intransitive, functionally adverbial class called ADV, and a transitive one called ADP, I agree with @Stormur that it would be more elegant to just call the particles ADV as well. But since this only affects upos, and as @nschneid said there is a status quo argument to be made, I'm not passionate about changing it.

how radical would it be to allow ADP to be assimilated into ADV, or equivalently to have a single common class, and then to let the deprel advmod/case distinguish the use?

Some corpora are only tagged and not parsed, so I think this would cause problems (and in some languages where prepositions are denominal, this would look odd even ignoring that issue)

Stormur commented 3 years ago

I think the etymological argument makes sense here. Most of these particles are originally adverbs

@Stormur - for Germanic I agree, but I think we should keep in mind that in other languages prepositions are not derived from adverbs (e.g. Afro-Asiatic), so this doesn't need to be extended universally.

It is probably a generally typical Indoeuropean thing... but of course you are right. Is there another "primary" category in Afro-Asiatic like ADV here?

Stormur commented 3 years ago

In general, adpositions and particles are closed-class while adverbs are not. So if the principle is, when in doubt minimize the number of distinct tags per word type, ADP would be the better choice for most particles (at least in English).

I do not totally agree, here. I see that "adverbs" are a often a very problematically defined, underspecified class. I am more and more inclined not to consider adverbs many which traditionally are labeled as such. But anyway, inside the big ADV family, there are at least two distinct branches: 1) derived adverbs, such as the -ly in English, and those are ineed open, as long as the classes of their respective bases are open; 2) underived adverbs, like up, and those are closed. So one really wonders if for those in 1) it would not be better to trace them back to their bases (and this would be by itself a big step towards general adnotational minimisation), or at least to acknowledge their label as ADV is just a conventional one.

More generally, I think less that phrasal verb adverbs are intransitive prepositions, and more that (most Germanic) prepositions are transitive ADV, if anything.

I really find this an illuminating definition!

how radical would it be to allow ADP to be assimilated into ADV, or equivalently to have a single common class, and then to let the deprel advmod/case distinguish the use?

Some corpora are only tagged and not parsed, so I think this would cause problems (and in some languages where prepositions are denominal, this would look odd even ignoring that issue)

Hm, then, in those cases (and now that you mentioned it, many possible examples in Latin, Italian, Greek... sprang to my mind), I think it might be viable, if the construction is still transparent enough, to continue assimilating them to NOUNs. In other cases, the denominal expression might also be assimilated to ("transitive") adverbs. For example, in Latin:

I see the best annotation in keeping gratia 'favour' as the noun in the ablative case it is, working as an obl, and on which exempli depends as an nmod, contrary to seeing it as a (pseudo)adposition, which would make it an extremely anomalous adposition indeed.

But I can also imagine a NOUN directly depending with case from another NOUN if the construction no more displays a transparent structure, but the noun is still fully recognizable as such.

amir-zeldes commented 3 years ago

Is there another "primary" category in Afro-Asiatic like ADV here?

Yes, most often it's from nouns. But that doesn't mean their current forms are identical to nouns: often you'll get things like "foot of X" -> under, or "head of X" -> on, but these are then phonologically reduced and gain distinct, grammaticalized forms. So tagging them as nouns would also be odd synchronically. And maybe more importantly, I wouldn't want a random linguist looking at the data to be too surprised- I think most people working on Afro-Asiatic languages would expect such items to be tagged as prepositions.

nschneid commented 3 years ago

I think it would help if @manning or @sebschu could weigh in regarding the ADP policy in EWT for particles and whether they would prefer to keep it or change it to ADV.

Stormur commented 3 years ago

Is there another "primary" category in Afro-Asiatic like ADV here?

Yes, most often it's from nouns. But that doesn't mean their current forms are identical to nouns: often you'll get things like "foot of X" -> under, or "head of X" -> on, but these are then phonologically reduced and gain distinct, grammaticalized forms. So tagging them as nouns would also be odd synchronically. And maybe more importantly, I wouldn't want a random linguist looking at the data to be too surprised- I think most people working on Afro-Asiatic languages would expect such items to be tagged as prepositions.

Yes, it is probable that even an approach like the one I am suggesting would not completely eliminate the class of adpositions, as some elements are probably still "irreducible".