UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

What even is a copula #822

Closed aryamanarora closed 1 year ago

aryamanarora commented 2 years ago

Many issues coming up following:

Underlyingly, the problem is that it's not clear what a copula is per UD guidelines, and it's not clear when the dependency structure should be nsubj from predicate or from the verb.

UD guidelines mention 6 different types of copular constructions:

  1. Equation (aka identification): “she is my mother” ਉਹ ਮੇਰੀ ਮਾਂ ਹੈ
  2. Attribution: “she is nice” ਉਹ ਚੰਗੀ ਹੈ
  3. Location: “she is in the bathroom” ਉਹ ਬਾਥਰੂਮ ਵਿੱਚ ਹੈ
  4. Possession: “the book is hers” ਕਿਤਾਬ ਉਸਦੀ ਹੈ
  5. Benefaction: “the book is for her” ਕਿਤਾਬ ਉਸ ਲਈ ਹੈ
  6. Existence: “there is food (in the kitchen)” (ਰਸੋਈ ਵਿੱਚ) ਖਾਣਾ ਹੈ

All 6 use the same verb in Punjabi.

nschneid commented 2 years ago

Note that in English existential constructions the verb is not considered a copula (UniversalDependencies/docs#706); though the lemma is the same I guess it is considered syntactically different per the third bullet point.

aryamanarora commented 2 years ago

Right, from what I understand even in languages like Czech (which have no expl dependent in existential construction, and use the same verb for all 6, just like Punjabi) the verb is still the head in existential constructions, as given in an example on that page. So I'm not sure what that third bullet point's goal is, because it seems for all languages UDv2 wants existential constructions to have the verb as the head, not the subject.

nschneid commented 2 years ago

@dan-zeman Are there languages for which the verb in the existential construction IS treated as a copula?

aryamanarora commented 2 years ago
amir-zeldes commented 2 years ago

Just a note to point out, in English we have different trees for copular:

  1. A dog is in the yard (root: yard)

But existential

  1. There is a dog in the yard (root: is)

In many languages, the two are much more similar, for example the easiest Hebrew equivalent of both of these has the same words, but word order can make it sound more like 1 or 2:

  1. yesh kelev b-a-xatser (exists dog in the yard)
  2. yesh b-a-xatser kelev (exists in the yard dog)

There may be other paraphrases of course, but I don't feel any syntactic dependency difference between these. Current UD Hebrew views both as existential, with root "yesh", but if there is no verb at all (zero copula), we always have to make the location be the predicate:

  1. ha-kelev b-a-xatser "the dog is in the yard" (root: yard, lit. "the-dog in-the-yard")
dan-zeman commented 2 years ago

I suggest to move this issue to the main issue tracker at the docs repository since it is about annotation guidelines rather than about bugs in a treebank.

dan-zeman commented 2 years ago

@dan-zeman Are there languages for which the verb in the existential construction IS treated as a copula?

I am not aware of such examples in the current data (which does not necessarily mean they don't exist).

I now lean towards saying that the Czech existential sentences should be treated as if být “to be” were a copula there. But I have not implemented it in the data, nor modified the examples in the documentation. With a location, the situation is similar to what @amir-zeldes describes for Hebrew. Pure existentials without a location will either use the verb existovat “to exist”, which will be treated as a regular intransitive verb, or they will use být “to be”, which can still be tagged AUX but it will have to be promoted to the head position (as if the location predicate is elided), so there will be no cop relation.

aryamanarora commented 2 years ago

One of the issues that just came up is what to do when two obliques could be analysed as the predicate.

(Note this uses the normal copula in Punjabi, just in perfective aspect.) Seems weird to prefer "in Bengaluru" as the predicate over "on 12 December, 1950", since they can be swapped around in the sentence without changing meaning. In fact I'd find it weird to claim that either oblique is a predicate at all here, much like the case with Hebrew:

There may be other paraphrases of course, but I don't feel any syntactic dependency difference between these

Would it be a stretch to parse it as:

┐ ਉਨ੍ਹਾਂ ਦਾ ਜਨਮ (root)
├─ 12 ਦਸਬੰਰ, 1950 ਨੂੰ (obl:tmod)
├─ ਬੰਗਲੌਰ 'ਚ (obl:lmod)
├─ ਹੋਇਆ (cop)
└─ । (punct)

An in general, not treat obliques as predicates?

nschneid commented 2 years ago

Is this the same as English: "His birth was on 12 December, 1950 in Bengaluru"? There I think we'd take the first PP as the root.

dan-zeman commented 2 years ago

This was discussed extensively in December 2016 when the v2 guidelines were being prepared. Treating locations (and sometimes temporal specifications) as predicates was a major move for some treebanks. The possibility of having two adverbial-like candidates for the predicate was one of the major counterarguments but it was eventually dismissed (for better or worse; I'm not siding with either opinion here and I don't want to re-open that discussion, I'm just trying to recall the historical background). Chances are that based on the context, you can point at one of the adverbials and say it's the more salient one, then make it the predicate and the other will be its advmod or obl. If this is not obvious, you can take the first one as @nschneid says. And finally, you could also link the two adverbials via a conj (but I don't think I've seen anybody doing this).