UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 246 forks source link

Clauses with copulas #170

Closed ngiordani closed 8 years ago

ngiordani commented 9 years ago

Something we've discussed a few times in the English group but have never been truly satisfied with (or at least I haven't :)) is the treatment of sentences like:

He didn't say it was because he felt insecure.

This is to inform you of changes in the upcoming months.

Right now, we do:

root(ROOT, was) advcl(was, felt)

root(ROOT, is) advcl(is, inform)

I'd like to get some input on this. I'll add the people involved in the discussion on copulas with ccomp: @jnivre @manning @dan-zeman

jnivre commented 9 years ago

This is how they are handled in the Swedish treebank as well, and I am not sure I have anything better to off. There seems to be a sliding scale from be + predicative complement to be + adverbial modifier:

It is fun It is a joke It is on time It is in the garden It is for him It is to help him It is because we want to help him

Somewhere we have to draw the line between copula constructions and cases of the "existential be" with adverbial modifiers. I am personally in favor of extending the domain of copula constructions as far as possible, but it seems that the current consensus is that prepositional phrases with an "adjectival meaning" go with the copula, while prepositional phrases with a temporal or locative meaning go the other way. Thus:

she is in shape cop(shape, is) nsubj(shape, she)

she is in Uppsala nmod(is, Uppsala) nsubj(is, she)

This reminds me of a question that I have been meaning to ask you. What do you do with the presentational construction "there is":

there is a cat on the mat

expl(is, there) nsubj(is, cat) nmod(is, mat)

alessandrolenci commented 9 years ago

I definitely share your idea of extending the copula as much as possible.

Moreover I find very problematic the fact of using nmod with locative and temporal predicates in copulative sentences. "nmod is a noun http://universaldependencies.github.io/docs/u/pos/NOUN.html (or noun phrase) functioning as a non-core (oblique) argument or adjunct.” But locative predicates in copulative sentences are surely not modifiers nor adjuncts. And I believe this also holds true for benefactive cases like “The cake is for the boy”.

What are the strong arguments for not treating all these cases as instances of cop?

Finally, which solution has been adopted for languages with “possessive dative constructions” : The book is to me = I have the book. I think Finnish has such cases.

Best,

--a

Il giorno 07/mag/2015, alle ore 09:19, Joakim Nivre notifications@github.com ha scritto:

This is how they are handled in the Swedish treebank as well, and I am not sure I have anything better to off. There seems to be a sliding scale from be + predicative complement to be + adverbial modifier:

It is fun It is a joke It is on time It is in the garden It is for him It is to help him It is because we want to help him

Somewhere we have to draw the line between copula constructions and cases of the "existential be" with adverbial modifiers. I am personally in favor of extending the domain of copula constructions as far as possible, but it seems that the current consensus is that prepositional phrases with an "adjectival meaning" go with the copula, while prepositional phrases with a temporal or locative meaning go the other way. Thus:

she is in shape cop(shape, is) nsubj(shape, she)

she is in Uppsala nmod(is, Uppsala) nsubj(is, she)

This reminds me of a question that I have been meaning to ask you. What do you do with the presentational construction "there is":

there is a cat on the mat

expl(is, there) nsubj(is, cat) nmod(is, mat)

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/170#issuecomment-99753020.


Alessandro Lenci

Dipartimento di Filologia, Letteratura e Linguistica Università di Pisa Via Santa Maria 36 I-56126 PISA Italy

tel.: +39-050-2215638 fax: +39-050-2210667 WWW: http://colinglab.humnet.unipi.it/people/lenci/ skype: alessandro.lenci

spyysalo commented 9 years ago

@alessandrolenci : UD Finnish defines a subtype of nmod, nmod:own. From that documentation:

This kind of an analysis would naturally result in the haver being marked as a nominal modifier, nmod. However, as nmod is a very frequent dependency type that encodes many different meanings, the information that the clause is about having or owning would be lost. Therefore, the UD Finnish scheme applies the nmod:own dependency type for nominal modifiers that encode owning, following the approach of TDT.

ngiordani commented 9 years ago

Joakim, I should clarify that that's actually not what we do in English. We treat the two sentences you mentioned as exactly parallel:

she is in shape cop(shape, is) nsubj(shape, she)

she is in Uppsala cop(Uppsala, is) nsubj(Uppsala, she)

That makes the case I mentioned before, with the adverbial clause, even more of an outlier... Do you think it's important to make a distinction? In any case, we should probably make this a wider discussion to make sure we're at least moving towards a uniform analysis across languages...

As for "there is," we do exactly what you suggested!

On Thu, May 7, 2015 at 12:19 AM, Joakim Nivre notifications@github.com wrote:

This is how they are handled in the Swedish treebank as well, and I am not sure I have anything better to off. There seems to be a sliding scale from be + predicative complement to be + adverbial modifier:

It is fun It is a joke It is on time It is in the garden It is for him It is to help him It is because we want to help him

Somewhere we have to draw the line between copula constructions and cases of the "existential be" with adverbial modifiers. I am personally in favor of extending the domain of copula constructions as far as possible, but it seems that the current consensus is that prepositional phrases with an "adjectival meaning" go with the copula, while prepositional phrases with a temporal or locative meaning go the other way. Thus:

she is in shape cop(shape, is) nsubj(shape, she)

she is in Uppsala nmod(is, Uppsala) nsubj(is, she)

This reminds me of a question that I have been meaning to ask you. What do you do with the presentational construction "there is":

there is a cat on the mat

expl(is, there) nsubj(is, cat) nmod(is, mat)

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/170#issuecomment-99753020 .

jnivre commented 9 years ago

Thanks for clarifying this. I am all for extending the use of the copula as far as possible, since I have never been impressed by the idea that "is" is a copula in one case and the existential "be" in the other. I got the impression that both Czech and Finnish make this distinction, though, so I (mistakenly) assumed that this was the official policy.

Incidentally, the Swedish treebank currently makes a similar distinction, due to the original treebank annotation, but I intend to fix this in the future (although probably not in time for v1.1).

fginter commented 9 years ago

In "the meeting is at 5PM in the yellow building" - which of the PP phrases should then be chosen to be the head? And what will the other become?

ngiordani commented 9 years ago

In that case, I would choose at 5pm. The other is an nmod.

Interestingly I say this without blinking, but I'm not immediately sure how to articulate the rationale... :)

On Thu, May 7, 2015 at 10:04 AM, Filip Ginter notifications@github.com wrote:

In "the meeting is at 5PM in the yellow building" - which of the PP phrases should then be chosen to be the head? And what will the other become?

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/170#issuecomment-99939777 .

fginter commented 9 years ago

So how about "the meeting is in the yellow building" Now the building is the head because there is no other candidate. But in "the meeting is at 5pm in the yellow building", building is an "nmod". This gets us the exact same phrase in the exact same context, once analyzed as a head, and once as an nmod. I don't think that is right.

jnivre commented 9 years ago

I would say it is similar to the case of double objects. One has to be chosen as the direct object; the other becomes indirect object. Here one is chosen as the predicate and the other becomes nmod. I am less sure about the criteria here, though, and my first instinct was actually to take the locative as the predicate. But perhaps it is a good idea to simply take the first one. Then these two sentences would have different syntactic structures:

The meeting is at 5pm in the yellow building. The meeting is in the yellow building at 5pm.

jnivre commented 9 years ago

Again, it is not that different from:

He sent her a letter. He sent her.

In the first sentence, "here" is iobj; in the second it is "dobj".

ngiordani commented 9 years ago

Yeah, I agree with Joakim about the double-object analogy.

On Thu, May 7, 2015 at 10:11 AM, Joakim Nivre notifications@github.com wrote:

Again, it is not that different from:

He sent her a letter. He sent her.

In the first sentence, "here" is iobj; in the second it is "dobj".

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/170#issuecomment-99941289 .

fginter commented 9 years ago

I don't think the double object analysis is all that optimal either. And this would now expand what I think is illogicality to a massive number of additional cases. Having the exact same phrase analyzed differently only because it happens to have a competing candidate still doesn't sound right to me.

jnivre commented 9 years ago

A common method in syntactic analysis is to use questions:

In both cases, the predicate is the one that fills the slot corresponding to the question word, but it is always possible to add extra information. Just as:

The point is that "the exact same phrase" is a slippery notion in grammar. At any rate, we cannot simply equate it with "the exact same string", as shown by Chomsky's famous example "flying planes can be dangerous". Or perhaps I should say: "analyzing sentences can be fun". :)

fginter commented 9 years ago

BTW: This is very common in the Finnish data ("to be" with multiple nmods) and the nmods are also positioned to the left or to the right, any which way. http://bionlp-www.utu.fi/dep_search/?db=Finnish&search=L%3Dolla+%3Enmod+_+%3Enmod+_ A similar search in the English data reveals much smaller frequency, which might make you think this is not too common.

fginter commented 9 years ago

@jnivre understood. But I am unconvinced that "being the first after the verb" is a merit sufficient for being selected as the head and given a special treatment. In "the meeting is at 5PM in the yellow building" I think semantically both 5PM and building are equally good candidates, play the same function, and therefore should have the same deprel. And I also think their function (and therefore their deprel) does not change if you add/drop one of them.

fginter commented 9 years ago

Then these two sentences would have different syntactic structures: The meeting is at 5pm in the yellow building. The meeting is in the yellow building at 5pm.

I fail to see why we would want these to have different tree. Especially because in languages other than English, the order carries much less meaning.

jnivre commented 9 years ago

So where do you draw the line in Finnish? What is required for something occurring with the verb "to be" to be regarded as a predicate? Nominative case?

alessandrolenci commented 9 years ago

I think that they have different structures:

  1. The meeting is in the yellow building at 5pm
  2. The meeting is at 5pm in the yellow building

In the yellow building is the main predicate of the copula in 1.

in a locative sentence like 3. I typically can not front the PP (cf. 4.):

  1. John is in the garden
  2. * In the garden, John is

Similarly, if I front in the yellow building, in 1. the meaning changes:

  1. In the yellow building, the meeting is at 5pm

Now the main predication concerns the time of the meeting, and the location acts like a circumstantial.

--a

Il giorno 07/mag/2015, alle ore 19:44, Filip Ginter notifications@github.com ha scritto:

Then these two sentences would have different syntactic structures: The meeting is at 5pm in the yellow building. The meeting is in the yellow building at 5pm.

I fail to see why we would want these to have different tree. Especially because in languages other than English, the order carries much less meaning.

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/170#issuecomment-99956108.


Alessandro Lenci

Dipartimento di Filologia, Letteratura e Linguistica Università di Pisa Via Santa Maria 36 I-56126 PISA Italy

tel.: +39-050-2215638 fax: +39-050-2210667 WWW: http://colinglab.humnet.unipi.it/people/lenci/ skype: alessandro.lenci

fginter commented 9 years ago

The annotation manual says: The basic alternatives for predicatives are nominals (nouns, adjectives, pronouns and numerals). Words of these parts-of-speech are required to be in nominative, partitive or genitive to be accepted as predicatives. Nominals in any other case are not marked as predicatives, even if they are associated with the verb olla. They, similarly to adpositional phrases, are marked as nominal modifiers (nommod), and the verb is marked as the head of the clause, even if it is olla. This restriction is to prevent a clause from having two predicatives and hence two heads. In addition to nominals, also adverbs can act as predicatives, given that they do not express location or time. Note that with adverbs, there is no restriction with regard to case, only that they are not locational or temporal. As a result, adverbs such as täällä (here) or huomenna (tomorrow) can not act as predicatives, but others, such as naimisissa (married, inessive adverb) and raskaana (pregnant, essive adverb) can, regardless of their case.

jnivre commented 9 years ago

Thanks a lot! That seems like a coherent position. :)

fginter commented 9 years ago

@alessandrolenci the word order produces different focus in some (but in Finnish definitely not all) cases. Naturally. But I argue it is not the task of the UD analysis to try to encode these distinctions. Especially since they are very fine.

spyysalo commented 9 years ago

This discussion is moving pretty fast for me, but FWIW, my intuition for Finnish agrees with Filip's, and I think it is a strength that the syntactic analysis abstracts over differences in word order also in these cases.

Indeed, for a language with fairly free word order, abstraction over word order changes is arguably one of the key benefits of parsing, and a scheme that would give a substantially different analysis over a "mere" change in word order would be less useful, not more.

dan-zeman commented 9 years ago

In Czech, the default is adjective or noun in nominative or instrumental (under certain conditions also in genitive). But the literature admits that the matter is complex. For instance, Grepl et al. (1995), Příruční mluvnice češtiny, adds a long list of situations where also adverbs and prepositional phrases function as predicates (but these do not include locational or temporal modifiers). Also, there are other verbs than to be that can serve as copulas.

The PDT annotation manual (http://ufal.mff.cuni.cz/pcedt2.0/publications/a-man-en.pdf) restricts copulas to variants of to be, and only if the predicate falls into one of 8 categories:

Prepositional phrases and adverbs are explicitly excluded (and analyzed as adverbial modifiers instead), even in cases whose meaning is stative and quite close to some verbo-nominal predicates from the eight categories above.

The current Czech UD documentation (http://universaldependencies.github.io/docs/cs/dep/cop.html) distinguishes the state (I should have said internal state) of the subject, from its location in space or time.

Thus the following two sentences receive parallel analyses:

The following two will be parallel, too:

jnivre commented 9 years ago

So it seems that the Finnish-Czech position is that temporal or locative modifiers (whether nominals, prepositional phrases or adverbial phrases) are never predicates but anything else (regardless of form) is up for grabs. I can live with that (at least for the time being).

dan-zeman commented 8 years ago

Note: the Uppsala meeting included a discussion group on copulas (report here).

Can we close this issue now?

jnivre commented 8 years ago

I'd be happy to close the issue, but what is the conclusion? The copula is limited to the verb "be" occurring with what types of complements/modifiers?

dan-zeman commented 8 years ago

The copula is limited to the equivalents of to be (and I would vote for both ser and estar in Romance languages). It looks like extensions have been discussed in Uppsala and elsewhere but without clear outcome, so the previous rule holds until further notice, doesn't it?

Local and temporal modifiers are not copula constructions. Clauses are also excluded (that was the original @ngiordani 's inquiry), even if their meaning is other than local/temporal: in This is because we do not know what to do, the verb is is the head. The reasoning is that 1. with local/temporal the be is existential rather than linking; 2. there may be multiple local/temporal modifiers but we would have to select one as the predicate; 3. as for clauses, the clause may have its internal subject (in addition to the external subject of the copula) and we do not want to attach two subjects to the same node.

jnivre commented 8 years ago

Thanks for the summary. Let's close this for now and possibly revisit it for v2.