Closed nschneid closed 3 years ago
If I am not mistaken, be is not considered a copula in existential clauses in English. It is tagged VERB
and analyzed as the head of such clauses.
But whenever be is attached as cop
or aux
, it must be tagged AUX
and the validator will check it. I think that it should be also AUX
in sentences where it functions as a copula but it is promoted to the head position either because of ellipsis or because the predicate is itself a clause (the problem is that he did not come). But the validator cannot check this.
More generally (i.e., without focus on English), cop
should never co-occur with VERB
. Nevertheless, it is not required to be AUX
if the copula word is a pronoun; then it retains the PRON
tag (even DET
would be tolerated by the validator).
If I am not mistaken, be is not considered a copula in existential clauses in English. It is tagged
VERB
and analyzed as the head of such clauses.
Why? Is the idea that existential uses of "be" are somehow more contentful than copular and other auxiliary uses of "be"?
Some discussion is archived here. The third bullet point says that existential clauses are not treated as copular clauses if the verb has a different lemma than the equational copula, or different syntax. I think the "different syntax" was mentioned specifically because of English (where the there expletive is used, and the subject occurs after the verb). I'm personally not fond of this approach.
I guess the idea is that AUX should not be a head, so it should be tagged as VERB. But this is not discussed on the AUX page.
Still, I'm not sure why the above couldn't be analyzed as root(kitchen), nsubj(kitchen, food), cop(kitchen, is), expl(kitchen, there). It is very similar in meaning to "Food is in the kitchen". Plain "There is food." could be treated like ellipsis, promoting "is" to the root: root(is), nsubj(is, food), expl(is, there).
(BTW I checked CGEL and couldn't find a definitive statement on whether 'be' in an existential sentence is considered a copula. There is a list of different uses of 'be', including copulas, progressive and passive auxiliaries, quasi-modals, etc., but no existential example is given there.)
There are legitimate cases where AUX
is the head of a clause (mentioned above: promoted in VP ellipsis, and certain rare cases of copular clauses).
The AUX page says that the category includes copulas. But if some instances of be are not considered copulas, then they lose the reason why they should be AUX
.
I think one issue with annotating "there is" with the cop analysis is what to do when there is no locative phrase, e.g.:
There are no alternatives.
Saying that "there" is the predicate seems weird here, and the 'be as root' analysis just says this is like "no alternatives exist".
I think that if the locative phrase is missing, the copula (provided we say it is copula) will be promoted to the head position as in other instances of ellipsis (but it will not change its UPOS tag because of that).
This should probably be the analysis in Czech (but the conversion procedure does not do it properly yet). The verb být "to be" is the equational copula, it is also used in existential clauses (with or without locative phrases) and unlike English, there is no special syntax.
I'm honestly OK with the way it is for English - if I say "there is a god", it doesn't feel like I have an elliptical locative predicate anywhere. I think it's a different sense of 'is', which is reflected in the construction with the expletive etc., maybe some intonation or stress differences too. Is there a real downside to assuming a reading of be that is a strong verb?
I've thought about this some more. To recap, we are dealing with 2 questions: 1) What should the dependency analysis be for existential constructions? 2) Should the be-verb in existential constructions be classified as a copula, and therefore tagged as AUX
?
With there+be, a subject suffices when normal copula "is" would require a complement:
This suggests that "in the kitchen" is an adjunct in the existential sentence but an argument in the simple copular sentence. However, UD doesn't generally make an argument/adjunct distinction, so this may not be relevant.
cop
in a multi-argument existential clause ("There are cookies in the kitchen").
I'm honestly OK with the way it is for English - if I say "there is a god", it doesn't feel like I have an elliptical locative predicate anywhere. I think it's a different sense of 'is', which is reflected in the construction with the expletive etc., maybe some intonation or stress differences too. Is there a real downside to assuming a reading of be that is a strong verb?
There may be a cline semantically. Agreed that "there is a god" (or "there is a space shuttle that goes a bajillion miles per hour") emphasize pure existence-in-the-universe, but something like "there is food" implies existence in a particular situation.
I don't have a strong opinion as to whether promotion only when there's no locative phrase makes sense. I'm hesitant to posit a different verb sense requiring a different POS tag, though.
It's only a different pos tag because of how upos defines AUX, which in my opinion is a functional term (something serves as an auxiliary) rather than a morphological category (if anything, 'be' is a morphological verb in all these uses).
Other languages with PRON copulas already make this concession by saying pos is PRON while deprel is cop. I don't 100÷ understand why VERB can't be aux, but I do prefer the non copula analysis of pure existence cases, so I guess I'm happiest with the current situation (also less work :)
How about this as an argument: making a question out of a there-be existentials involves fronting the be-verb:
There is a god -> Is there a god?
This inversion is reminiscent of copulas or auxiliaries, not full verbs, which require do-support:
There exists a god -> Does there exist a god? / *Exists there a god?
I completely agree that 'be' has interesting and idiosyncratic syntax, but I don't think that makes it an auxiliary in all contexts. The inverted order appears in archaic constructions to this day ("dare we change this guideline?") but that doesn't mean we have to tag the matrix verb in all of them as AUX. Nor is the tag AUX monolithic in UD English syntactically, for example modals have quite different syntax from 'be' (and 'have').
For me at least, being an auxiliary is a relational property. For existence predication, like "to be or not to be", I don't feel there's something there that it's serving as an auxiliary to (be alive?).
But I'll also admit two biases I have: I don't really use upos due to how coarse it is, and I don't like changing broadly used, high impact guidelines without really strong reasons. It just leads to tons of conversion and validation errors, tools that behave differently/erratically, and explaining to students that this corpus looks different because it's from before version x, and in those slides it's version y, and then...
I'm willing to do that work for improtant improvements, but this looks more like a case of "could probably work either way" to me.
Just for fun I created a Twitter poll: https://twitter.com/complingy/status/1259256679000166406
In the discussion Valia Kordoni asked about case. While it is pragmatically weird to use a cased pronoun in an existential construction, my intuitions are roughly the same as for regular copulas: "There is I/me" feels like "It is I/me". Nominative feels stilted and accusative feels colloquial. Perhaps this is another argument in favor of the copula analysis.
I'm not sure what I/me shows here, since in any case it's the subject not the (proposed) predicate in "there's me". Shouldn't the subject be (normatively) nominative in both constructions? In the copula construction we normally have:
"I'm a teacher" Or with locatives: "I'm here/in the house"
So arguably one could say the preference to "me" shows that it's not the copula construction. But I'm not making that claim - I don't think it's evidence one way or the other, IMO it just shows that English doesn't like to have "I" post-verbally in the linear order.
The point is that English speakers don't feel entirely comfortable with cased post-copular subjects, and both nominative and accusative are acceptable to an extent. "The teacher you were talking about is I/me"—"I" sounds more formal, perhaps because it was prescriptively taught as the proper form, while "me" sounds more casual. (Similar to the situation with coordination: people were taught in school to say "John and I" rather than "John and me" and now the former is often heard even in object position.) I'm not aware of a verb other than "be" that has this flexibility postverbally—nobody would say "He saw/told I", for example.
I still don't understand why this matters - the shape of the subject isn't the issue, it's the predicate that is unique in copula constructions (namely, that the verbal element is not the predicate). In "the teacher you were talking about is me", we are predicating something like me(teacher), so we call it a copula construction. The fact that 'I' is possible here, but "there's I" is not is, if anything, a difference between the copula and the existential construction. In something like "there's a problem", by contrast, we predicate exists(problem), not there(problem) or location(problem,there) or anything like that.
As for the instability of 'I/me' postverbally, I think it's neither a necessary nor a sufficient condition for distinguishing the two constructions: there are plenty of cases in which 'be' is not really equating or being an auxiliary, in which inversion can't happen:
I think therefore (I/*me) am
My real issue with considering this AUX
is that I don't see what it's an auxiliary to. Is there some kind of ellipsis here? I think in the second clause this is:
I'm not saying everything fits neatly - in a Radical Construction Grammar sense, 'be' in the 'there construction' is its own thing period, but if we have to choose between VERB and AUX, I see pretty good reasons to go with VERB, and even if we think it's a close call, not enough reasons to overturn a highly impactful guideline that would require revising all UD English TBs and reeducating everyone who knows this guideline.
I'll grant that in "I think, therefore I am", where the be-verb has only an ordinary subject and no expletive or complement, it seems to be more of a full verb than an aux/copula. Semantically, existential constructions are somewhere in between that sense and normal copular predications (especially when a location is overt). Syntactically, question inversion is in common between existentials and normal copulas. And intuitively I always assumed it was a copula—while I knew there were special guidelines for existential dependencies it never occurred to me that it wouldn't be tagged as AUX (and I don't think this is really documented, hence this issue).
Maybe this is really an intermediate case where "be" is acting not quite like a full verb or a copula, and it seems people on Twitter are divided on whether it should be called a copula (currently it's 24 yes–21 no). I'd like to hear from other UD folks/syntacticians.
Not persuaded by the stare decisis argument because a) the tagging decision specifically is not really documented in the guidelines yet, b) it seems (to me) like more of a special exception to call it a VERB in existential constructions but not copular constructions, so annotators might find AUX more natural, and c) it is trivial to change existing treebanks with a rule. But again, if people are convinced there are strong syntactic arguments on both sides, I'm happy to have the status quo documented. :)
I think stare decisis is more persuasive as to the dependencies—the policy is clearly documented, and anyway I think we are bound to lose something about the construction in any pure-dependencies approach—so I see less of an argument for changing those.
If we accept that a word can be AUX just some of the time, then this becomes a question of which ones are which, rather than a sweeping guideline (be -> AUX). The more I think about it, the more I realize that for me the main criterion is not whether there's inversion or some other morphosyntactic phenomenon, but rather whether there is some kind of auxiliary function. For the following, I think there is nothing 'supported' by 'be', so I think it should be VERB:
If those are OK as be/VERB+root(be), then we only need to talk about the difference between:
I am fine with the second one being root(jar)
and cop(jar,are)
- I think this makes things much easier for cases were the copula is elided ("Stevie in the house"), and many languages normatively have locative constructions with and without copula.
What about "there are cookies in the jar"? Functionally/semantically, it is very similar to "the cookies are in the jar". But I'd like to argue it's a good idea to treat it like "there are cookies", because it is very difficult to draw the line between locative predication and adjuncts to the existential predicate. UD wants to avoid the argument/adjunct distinction, but since in this case we need to differentiate dependent from predicate, we have to choose a side:
If so, OK, but what about this:
If this is not semantically forever(cookies) but rather exist(cookies,forever), what about some semantic locations:
amod
can't possibly make this a non-existential predicate, right?advmod
to the existential predicate, no?advmod
to the existence predicate...And if the last one doesn't look 100% clear cut, that just means we will soon have annotators getting confused and disagreeing. We could try to look for more borderline PP cases (where the PPs are maybe non-compositional MWEs, or whatever), but I think if we agree there is any kind of PP or ADV in which it's still be/VERB, then life is much easier and the TB more predictable if we say this:
I also think this actually is stare decisis - not because it's in the UD guidelines (though it is in the GUM guidelines, so that's a res judicata), but because that's how the TBs currently behave, and changing it would be a hassle that I don't see a real benefit from. More generally I prefer a little simplistic but predictable to really clever but not reliable - form based guidelines usually lead to the fewest surprises IMO.
Personally, I have to say that, after some reasoning, I'm advocating for the non-copular interpretation of the existential clause.
A more etymological point of view which overlaps with other arguments in this conversation: there seems to be originally just an obl
in a sentence like A god is where the difference with the copular is has to be marked somehow, as its existential value has become less strong and definite than the equivalent A god exists (and it bears little "substance"). The inversion of the elements is then a Germanic characteristic that generally got lost in English, but was retained in some fix constructions (such as so do I). So, all in all, there is a god has no copula. By the way, I think that other behaviours such as question inversion are just lexically determined synchronically, and do not depend on a possible underlying copula.
Comparing with Italian: the exact same thing happened with c'è/ci sono 'there is/there are' (fixed and indivisible) (ci = dative/locative clitic, è/sono from essere 'to be'), and esserci 'to be there' (essere + ci) might even be regarded as a verb in its own right, different from essere, very close to esistere or something like essere presente 'to be present', trovarsi 'to find itself [in]/to be': un dio c'è / c'è un dio 'there is a god'.
This brings me actually to question the validity of treating locative nonverbal predications (point 3 here) as copulas, rather than full VERB
+ obl
constructions. From a personal point of view, I have to say I find this part of UD annotation very confusing.
For what it is worth, again with respect to Italian, at school we were taught to treat constructions such as la penna è nel cassetto 'the pen is in the drawer' not as copulas, but as the exact equivalent of la penna si trova nel cassetto 'the pen "finds itself" (= is) in the drawer', i.e. as verbal predicates with a full verb; as such, in UD terms: nsubj(trova,penna), obl(trova,cassetto), expl(trova,si) (reflexive). This is the point of view of traditional Italian grammar.
By the way, just another cross-linguistical comparison with Mongolian, a language where copulas can normally be omitted, e.g. Энэ миний ном / Ene minii nom 'This [is] my book': here the copula байна / baina 'is' is redundant. But (a good reference is Janhunen 2012, §7.5) in existential clauses this is normally not allowed, "though the adverbial modifier can be absent when understood from the context" (Janhunen 2012): Манай өрөөнд хоёр сандал байна / Manai öröönd khoyor sandal baina 'There are two chairs in our room', but also, Xоёр сандал манай өрөөнд байна 'The two chairs are in our room', depending on topicalization.
To sum it up: such different possibilities in the omission of the "existential element" point to two different constructions and, I think, to the fact that existential and locative clauses (points 3 and 6 here) might represent the same phenomenon and should not be treated as copulas.
@Stormur : FWIW, I've been taught the same thing in the school (i.e., locative phrases with to be are not considered predicates with a copula), but apparently the grammatical tradition is not the same everywhere. There was a heated debate about this when v2 guidelines were being prepared, and the points you cite are the result. I think the main argument was that in some langauges, unlike Mongolian, the is can be omitted (EDIT: in fact, it must be omitted if it is in the present tense) exactly in the same manner as in equational phrases. E.g. Russian:
Эти два стула в нашей комнате. / Èti dva stula v našej komnate. 'The two chairs [are] in our room.'
@dan-zeman But isn't the fact that some languages do have this distinction, rather than some languages not having it, that should influence the fact that UD also makes such a difference?
I also mean, couldn't the Russian example be as well treated as an ellipsis root(стула)
, orphan(стула,комнате)
, especially if the omission is not mandatory (in Mongolian non-omission is)?
Interesting, I didn't know some grammatical traditions considered locative be-expressions not to be copulas! Makes me wonder if the UD guidelines should avoid the term "copula" altogether and instead be more specific as to where the line is drawn.
Regarding @amir-zeldes's ubiquitous cookies examples, I guess I don't see "ubiquitous" as being locative, whereas "everywhere" is (even though the underlying speaker intent may be similar, the semantic construal is different). Where are the cookies? They are everywhere/#ubiquitous. With some more context one could infer that ubiquity may imply occurring-at-many-locations (Where can I buy cookies? - Oh, they're ubiquitous! You can find them at any café.) but this seems beyond the scope of what we need to think about for syntax.
@Stormur : Sorry, I said it can be omitted but I should have said it must be omitted (in the present tense; in the past tense the copula is there). The point is that there are locative predicates that behave the same way as nominal-equational or adjectival-attributive predicates: they have no copula in the present tense and they use a copula in the past and future tenses. I agree that we could analyze the copula-less clauses as ellipsis (if the UD guidelines did not explicitly say otherwise) but then we would do it for all clauses with non-verbal predicates, locative or not.
Thanks @dan-zeman , I think that's a good motivation for keeping the copula analysis of locative predication. Indeed, it is very convenient for Afro-Asiatic languages like Arabic, Hebrew and Coptic, which also have locative constructions with and without copulas. This way, you can be sure the location points to the subject consistently, and if there is a copula it's just an extra thing.
As far as I can tell, the implementation for Hebrew, Arabic, Coptic and Slavic languages I've looked at is actually pretty uniform, which is always a pleasant thing to discover!
@nschneid I think we agree here, and I didn't mean to suggest that ubiquitous means the same thing as everywhere, just that spatio-temporal information can be added in all sorts of constructions, and it would be hard to tell when an adverb in a 'there' clause would start to count as the predicate (as opposed to an existential predication with an adverbial modifier).
@dan-zeman That was the counterexample I was waiting for! Yes, if it has to be omitted, than a systematical ellipsis would surely not be a good solution. I also began to wonder about the nature of the ellipsis (?) in locative predicates in Latin, as in in vino veritas 'In wine [lies] the truth'. It is not so uncommon, but I think it mostly appears in concise idioms, so it is still a particular use.
However, why is the conclusion that "then we would do it for all clauses with non-verbal predicates, locative or not"? From reading again the archived discussion here, I am understanding that UD ultimately leaves some freedom to the single languages as to how to deal with cases 2-5 (case 6 seems to be something of a different nature). So, for example, with the arguments I used before, I would be inclined not to treat case 3 with a copula in Latin (this is what happens in the ITTB), Italian or Mongolian, but probably it has to be treated as such in Russian and Turkish. How universal is that? Naively speaking, from this discussion I am getting the impression that locative predications might be borderline with respect to the other typologies; a grey zone where different languages disagree about the fact that, if the pen is in the drawer, the pen displays the property of "indrawerness". So actually different constructions and different syntactical trees would be justified.
From a practical point of view, I think not to be the only one confused by this issue, and that may be also due to the fact that the dedicated page is probably not explicit enough and seems to just deal with English. Probably types 2-5 (+6) would deserve to be "fronted" and schematically recalled also on that page, along with linguistic variance! Just a suggestion :-)
@Stormur when working on the guidelines, Stassen's book on intransitive predication was an excellent help. There is also a spreadsheet I was working on which has some analysis and comparisons.
@Stormur : I don't think there is so much freedom for single languages to deal with cases 2-5 here. I think that both 1 and 3 should be treated the same way in Latin. I had a hard time to find an example of 3 (locative) in ITTB, as it mostly talks about abstract things (but there seem to be plenty of non-locational prepositional predicates, which are also not treated as such!) Maybe this would work:
@ftyers Thanks for showing it to me! Do you need some consulence for e.g. Latin, Italian, or... Mongolian? 🙂 As for the book, I will surely give it a look.
@dan-zeman I was interpreting "All other cases of putative copula constructions (categories 2-5) should be assimilated to the equational and existential cases as seems to make most sense according to the inherent logic of the language concerned." as a way to say that, given the precedent points (overt copula, possible absence thereof, different syntax...), in the end, assuming the equative construction as the prototypical copula and the existential one as something different, each language needs to make its own assessments for cases 2 to 5. I see a bipolar (1 vs 6) scale of "copular prototypicity" implied here. So, for Latin that would lead me to use a copular interpretation for 1, 2, 4, 5 and a non-copular one for 3 and 6.
Edit: below some examples in Latin...
I have followed the discussion and noticed (a) that most of the arguments are familiar from earlier discussions, and (b) that there simply is no single analysis that deals with all of them in a satisfactory way. In fact, the treatment of copula constructions is in my view one of the least satisfactory parts of the UD guidelines, no matter how you deal with the corner cases, and I have been convinced for some time that it may have been a mistake not to treat all verbs as heads of clauses, including copulas. Let me try to articulate this position.
The main argument in favor of treating the nonverbal predicate as the head in sentences like "she is nice" and "he is a dancer" is to get a structure that is parallel across languages regardless of whether they use a copula or not. In all languages, we have:
nsubj(nice, she)
In some languages, we in addition have:
cop(nice, is)
The alternative analysis is to follow frameworks like LFG and treat copula constructions as essentially biclausal structures:
nsubj(is, she) xcomp(is, nice)
This would mean that we lose the parallelism with copula-less languages in the basic dependencies, but we can still capture it in the enhanced dependencies, where all languages (in virtue of how xcomp works) would have:
nsubj(nice, she)
Moreover, it would solve three problems with the existing analysis:
There would no longer be any question about where to draw the line between copula constructions, existential constructions and locative constructions. They would all have "be" as the root.
There would not longer be any question about where to draw the line between copula verbs like "be" and near synonyms like "get", "become", "appear", etc. They would all take an xcomp.
We would not be forced to exceptionally make the copula verb the root when the nonverbal predicate is a clause, as in "the problem is that it is difficult", where the first "is" is the root of the main clause in order to prevent difficult from having two subjects (which, by the way, is one of the ugliest part of the guidelines). It would no longer be an exception but the rule.
In conclusion, treating copula verbs just like other verbs would in one fell swoop eliminate all the major problems with the current analysis, at the very small expense of losing one parallelism in basic dependencies that can be regained in the enhanced dependencies.
As @dan-zeman knows, we were very close to making this switch when going from v1 to v2, and I honestly regret not seeing the pros and cons more clearly at the time. For v2, we are obviously bound by the current guidelines, but if we ever get to v3 I think we should seriously consider making this switch. I would be interested to hear what others think about this.
@dan-zeman Now for some ramblings of mine about the sentences, for examples and completeness, coming late...
cop
.I highlighted the alternative translations because I think that any correct translation should use a "full verb" and not just is, but I could be biased. The fact is that Latin esse 'to be' has a stronger existential value than its Italian (essere) or English counterparts. Probably, often the line is blurry and is represented by a tenuous syntactic difference; compare the motto est modus in rebus 'there is a measure in things', unequivocably existential: here, reminescent of the English construction, est is topicalized, but the same sentence would stay as modus est in rebus, probably with emphasis on measure: 'there is a measure in things'. I see this subtle difference actually as an argument to put, in Latin, 3 together with 6.
@jnivre I don't agree that this would solve everything because:
nsubj(table,cookies)
obl(are,table)
and edeps obl(are,table)
. In the copular interpretation it's the same as the previous example. If we are capable of deciding this using deps+edeps, we would have been able to resolve this before too.case
analysis of PPs, the mark
analysis of complementizers and a variety of other things. It seems odd for copulas to be the one exception to this.@jnivre We adopt the analysis of AUX as heads in SUD and the rule of conversion are ready and very easy to apply to all treebanks. @bguil can convert all UD treebanks as soon as you decide to do it. In fact they are all already converted in SUD and accessible with grew-match. We can easily restrict this conversion to AUX which are cop
.
Nevertheless, as soon as you do that for copulas, the question arises for auxiliaries, because the boundary between copulas and auxiliaries is not so clear. For instance, in English, the verb BE is used as a copula as well as an auxiliary in progressive and passive constructions and it has been argued in some works that it could behave as a copula in such constructions (especially passive).
What @jnivre said about enhanced relations in copulative construction is the idea behind SUD. The syntactic annotation should be a surface-syntactic one, based on disributional criteria. The copula is the head because it controls the distribution of the clause. And the parallelism between languages with or without copulas can be captured much more properly at the enhanced level.
@sylvainkahane in languages with optional copula and optional subject, it seems more natural to say the predicate controls the distribution of the entire construction. But in any case, if that is the criterion, then UD would be inconsistent if it doesn't revert nmod/obl+case to Stanford's prep+pobj model IMO.
@jnivre I will say from my experience that treating copulas as modifiers is one of the biggest challenges both for annotation and for converting UD to semantic representations. (The other huge challenge is coordination, which is messy under any framework.)
Apart from the ones you mentioned, reasons the current practice is difficult include:
mark
dependent of "tradeoff", and "otherwise" as an advmod
dependent of "tradeoff". Is it as simple as saying all mark
and advmod
dependents of nouns are clausal rather than nominal modifiers? No, because advmod
can be a PP modifier, for example (advmod(talk, right)
in "right after the talk").ccomp(added, submitted/VERB)
, ccomp(added, show/NOUN)
.If one of the design principles of UD is to be easily usable by downstream tools, arguably copula constructions are a pain point with the current approach.
BUT I think this raises a more general issue: As Enhanced Dependencies mature, should individual languages be freer to make other decisions about Basic Dependencies that are contrary to the previous emphasis on crosslinguistic parallelism? For example, treating adpositions as heads (as in Stanford Dependencies) where they are really adpositional and not case markers? Is it relevant (as @amir-zeldes raises) that Enhanced Dependencies are not as universally supported?
In other words: Are you suggesting we reconceptualize some of the fundamental principles of Basic Dependencies to shift some of the burden of crosslinguistic parallelism to Enhanced Dependencies? Or are you singling out copula constructions as uniquely problematic in our current approach?
Allowing languages to diverge more freely in Basic Dependencies would be a pretty radical change. I'm not advocating for that, I just want to have clarity on the principles of what should go in Basic vs. Enhanced.
I think basic dependencies should not be more anarchistic just because we have enhanced dependencies. The cross-linguistic parallelism should still be maintained as much as possible.
As for copula, I share @jnivre 's doubts whether the benefit of the current UD approach outweighs the downsides. However, @amir-zeldes is right that there is another specter lurking in the blurry line between be-copula and be-passive-auxiliary (not just in English). I forgot about this issue, although it was one of the awkward features of the Prague annotation scheme; I'm not sure I want to face it in UD :-)
Thanks @sylvainkahane for raising the passive issue, that is indeed a thorny problem (and more generally aux being a child is indeed something I would not expect if copulas are roots).
@nschneid I agree that unlike coordination and inability to extract the modification structure are problems with the current copula analysis, but they are also a problem with the rest of lexico-centrism:
case
as unmodifiable.cop
a dependent can also make access for semantic applications more easy: for example in coreference, potentially coreferring subjects and predicates are directly linked. Finding typical predicative adjectives of some noun is easy, etc. etc. - whether the analysis is good depends on what you're doing, but in any event, the construction is easy to recognize and annotate similarly across languages, especially ones where copulas are used in only some constructions (Semitic, Slavic, Latin, Greek, Coptic...). The same can be said for case and languages with optional adpositions.I have followed these discussions ever since UD phased out SD (Stanford Dependencies), and have also invested considerable time in converting data from SD to UD in the spirit of lexico-centrism. It was a lot of work, but I'm not sorry for doing it: the current analysis is more consistent IMO, I think that SD, although great for English, was not as cross-linguistically applicable, and the abstract principles behind UD are easier to defend and adjudicate (we do this in class every fall, when we expand GUM in a team of some 20 annotators). And now that copulas behave the way they do in UD, we have lots of graduate and undergraduate students who use them in corpus searches, write papers based on this, build tools, write guidelines for more low resource languages...
The reason I give this background is to remind everyone more concretely of all the people who are affected by massive changes like these. I'm not saying the current copula analysis is perfect, but neither is any alternative I'm aware of. Big breaking changes do mean however that a bunch of papers become less relevant/impossible to compare to, tools stop working, websites crash, and, if we're lucky, some dedicated students pour hours of work into making things run again. Given that price tag, my initial reaction to any statement that says that "something about UD should look very different from how it is now" is caution.
@amir-zeldes:
@nschneid I agree that unlike coordination and inability to extract the modification structure are problems with the current copula analysis, but they are also a problem with the rest of lexico-centrism:
Totally agree, I'm just a little more comfortable in principle conflating NP/PP modifiers than I am conflating NP/PP AND clause modifiers. :)
Perhaps there's a way to indicate some of these distinctions as features or deprel refinements somehow, without changing the basic tree, such that conversions into a format using functional heads, or NP/PP/clause constituents, etc., would become deterministic?
@jnivre
The alternative analysis is to follow frameworks like LFG and treat copula constructions as essentially biclausal structures:
nsubj(is, she) xcomp(is, nice)
This would mean that we lose the parallelism with copula-less languages in the basic dependencies, but we can still capture it in the enhanced dependencies, where all languages (in virtue of how xcomp works) would have:
nsubj(nice, she)
But wouldn't this mean (as others have already argued) that we just shift the problem of choosing whether to use cop
vs. e.g. obl
to the problem of choosing xcomp
vs. obl
?
While it has advantages, it seems to me a little like shoving the problem under the carpet (i.e. enhanced dependencies) and not really solving it...
@nschneid
It breaks the usual restrictions on what kinds of dependencies go with what parts of speech. Consider the EWT sentence "I hope that the US army got an enormous amount of information from her relatives , because otherwise this move was a bad , bad tradeoff."—"because" ends up as a
mark
dependent of "tradeoff", and "otherwise" as anadvmod
dependent of "tradeoff". Is it as simple as saying allmark
andadvmod
dependents of nouns are clausal rather than nominal modifiers? No, becauseadvmod
can be a PP modifier, for example (advmod(talk, right)
in "right after the talk").It breaks parallelism of coordinate structures: "He added that around 1,100 cartoons were submitted by participants from more than 60 countries and that more than 200 are on show"—
ccomp(added, submitted/VERB)
,ccomp(added, show/NOUN)
.
advmod
rather than UD copulas. There is indeed a typed relation advmod:emph
which addresses exactly such cases, but is not standardised. It would probably deserve more publicity (and maybe a renaming from "emphasis" to "focus", like advmod:foc
or something similar?). An entire new relation like emphasis is probably not unthinkable of, but maybe too much.
advmod:emph(behind,exactly)
to show this. But then the question is: is it possible to say something like behind [exactly the house]?ccomp
, but they really use different strategies (verb vs. predicative nominal).Finally, just for curiosity, has it already been proposed to conflate cop
and aux
into a single, overarching relation with possible specifications, like xxx:cop
or xxx:pass
?
I am happy to hear that things are not as simple as I thought. :)
@amir-zeldes
Granted. I was not suggesting that all treebanks must have enhanced dependencies. I was just pointing out that there may be ways to mitigate the loss of this parallellism.
If we make the change to treating "be" as a predicate, then I think we should use obl in both cases:
cookies are on the table nsubj(are, cookies) obl(are, table)
there are cookies on the table nsubj(are, cookies) obl(are, table) expl(are, there)
In fact, this alternation could conceivably be used as a test for the xcomp-obl distinction:
cookies are on the table there are cookies on the table
cookies are nice *there are cookies nice
This is obviously a crucial point, and it is related to something that I think we need to clarify anyway. I don't think lexico-centrism, or prioritizing content words over function words, is quite the right characterisation for the UD philosophy. I would rather describe it as prioritizing predicates, arguments and modifiers over grammatical markers. To a large extent, this is equivalent to putting content words above function words in the trees, but in the case of pronouns, for example, it is not. Pronouns are closed class items, but they typically occur in argument positions and are therefore treated in the same way as nouns and other words that play similar roles. So promoting copula verbs to heads would amount to treating them as predicates and not grammatical markers. The evidence for this move may be debated, but it wouldn't by itself necessarily entail reversing other relations that involve closed class items.
Granted again. This is what I once called "Ginter's razor", the less famous cousin of "Manning's law": Annotation changes should not be multiplied beyond necessity.
@jnivre the formulation using predicates and arguments is helpful, though in this case it just sharpens the central question, which is whether "(be) nice" is the predicate, or "be"...
For the locatives, if we use obl
for all of these cases, we are essentially treating them as non-copula constructions (since there is no xcomp
), so we would still need to distinguish two kinds of structures, and would not be free of the need to draw a line within each language. It also means that we don't have an automatic decision for 'copula near synonyms' like "get", "become", "appear", etc., since they would only take xcomp
in a subset of structures (should "Kim appeared in the kitchen" be xcomp
even when "Kim is in the kitchen" has obl
?)
@nschneid I'm much more in favor of using FEATS or MISC to add these distinctions if needed, while the core dependencies stay stable. More generally, if there's a structure I don't like, I can always write a script to change it internally for some application, but I can only rely on that script if the fresh data from the UD repos remains stable. IMO it's much better to know that the data on GitHub changes mainly in three ways:
Big changes not in these categories always make me nervous...
To @sylvainkahane's point about passives: Do I understand correctly that the objection is that passives and copulas can be difficult to distinguish? For languages where passives/copulas are marked by function words we already make the distinction (aux
vs. cop
). Is the concern mainly about other languages where the marking is morphological, and if so, wouldn't this already be captured in the morphological features?
If the point is that the line is blurry between them and so the structure should be similar—it seems like it could be explained by reanalysis/grammaticalization, which sometimes could involve a change in headedness.
For the locatives, if we use
obl
for all of these cases, we are essentially treating them as non-copula constructions (since there is noxcomp
), so we would still need to distinguish two kinds of structures, and would not be free of the need to draw a line within each language. It also means that we don't have an automatic decision for 'copula near synonyms' like "get", "become", "appear", etc., since they would only takexcomp
in a subset of structures (should "Kim appeared in the kitchen" bexcomp
even when "Kim is in the kitchen" hasobl
?)
I don't see any discussion in current guidelines of xcomp
for PP secondary predicates/depictives like "I declared the project on track" and "He entered the room in tears".
As I understand it, @jnivre's proposal would give us:
xcomp(declare, success/track/successful/be)
obl(was, track)
, xcomp(was, success/successful)
This is indeed a bit weird, especially considering that PPs are generally treated as similar to nominals. If "The project was on track" is obl
I might instead expect "The project was a success" to be obj
.
@nschneid :
For languages where passives/copulas are marked by function words we already make the distinction (aux vs. cop).
True, the distinction has to be made already now. But it is "only" between two relation types, both of them functional. The rest of the tree looks the same. On the other hand, if we move to treating copulae as predicates, we will end up with different heads.
aux:pass(built, was)
cop(built, was)
; then xcomp(was, built)
xcomp(declared, built)
I don't know if this proposal already existes and how it fits well with all of the above, but, if one of the main problems with copulae is that a clausal "non-verbal" predication would look as if having two subjects, couldn't it be admissible to introduce an obligatory subrelation nsubj:cop
to single out the copular subject? It is already used in some treebanks. Everything else about the actual treatment of copulae seems fine to me. Maybe the :cop
subrelation might be extended to other elements which modify the copula as a whole and not only the clausal predication (e.g. adverbs)?
Relation subtypes are never obligatory [:)]. But yes, some of them are strongly encouraged where applicable (nsubj:pass
, acl:relcl
and others).
But if I understand it correctly, nsubj:cop
is used for the subject of the clause whose predicate is non-verbal (with or without copula). I wouldn't say that it "modifies the copula".
I know, but this might be the first, right? :grimacing: I thought such an "obligatory subtype" might be cleaner and clearer, than, say, a new relation copsubj
.
I mixed two things: I mean, if you have the sentence The problem surely is that subrelations are not obligatory, then also surely could receive the :cop
subrelation to show that it is not modifying obligatory, but the whole clause which is part of the copula. So I actually meant "all elements pertaining to the whole copular construction and not only to the copulated clause".
The description of the existing nsubj:cop
(existing only for Finnish, UD v1) is very general and just speaks of the nominal subject of a copular clause, and is paralleled by csubj:cop
.
More broadly speaking, I think that the actual representation of copulae has nothing wrong. The "problem" of dependencies is just that we cannot distinguish between a phrase and the head of that phrase in terms of dependencies, so probably the only way is to make it explicit on the relation. As for a validation rule, maybe a double *subj
might be accepted if in presence of a cop
or no verb at all? Unfortunately, the actual solution with copula promotion and ccomp
treatment has, in my opinion, something inherently flawed and can birng to awkward structures... so anything else is better!
I think that this is probably a separate issue and has been discussed before, definitely around the time of the v2 guidelines. Issue #657 seems relevant.
For English, both EWT and GUM have many examples of copulas tagged as
VERB
.Are there certain constructions (e.g. existential constructions) where
VERB
is correct, or are these all errors? If the latter, would a validation rule be possible/appropriate, given that (at least for English) "be" is never ambiguous betweenVERB
andAUX
?