UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
270 stars 245 forks source link

Comparative constructions #104

Closed martinpopel closed 9 years ago

martinpopel commented 9 years ago

I don't see any guidelines (with examples) on annotating comparative constructions. See http://ufal.mff.cuni.cz/project/depling13/proceedings/pdf/W13-3721.pdf for a possible (pre-UD) solution.

spyysalo commented 9 years ago

Not sure if @mcdm et al. DepLing'13 is current, but guidelines for this should likely go to http://universaldependencies.github.io/docs/u/overview/specific-syntax.html .

mavela commented 9 years ago

I dont know for other languages, but for Finnish we have the comparatives described here, if it helps (2.11 and 2.12): http://tucs.fi/publications/attachment.php?fname=tHaverinen_Katri13a.full.pdf

spyysalo commented 9 years ago

@mavela : thanks! Those sections have been converted into part of the UD-Fi material on compar and comparator, but the status of these types in UD remains open (https://github.com/universaldependencies/docs/issues/73#issuecomment-60207017)

manning commented 9 years ago

I wrote up a treatment of comparatives here:

http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives

based on what was in the DepLing 2013 paper, but updated towards UD, and making the discussion a fraction less English-specific, while not actually including any examples from other languages. :(

I presume it should be taken as tentative at this moment, and open for discussion and expansion.

I suspect that comparatives are infrequent enough that we don't want dedicated relations for comparatives in UD (relations for just one construction seem unfortunate). Besides, we're not meant to be able to change the relations in UD now for at least 12 months, Joakim says. :) However, it seems to me like the Finnish TDT analysis could be mapped onto the analysis here in a language-specific way by having compar as a specialization of advcl, and comparator as a specialization of mark.

spyysalo commented 9 years ago

Thank you for the detailed proposal!

I'll defer to the rest of the UD Finnish team regarding the mapping of TDT compar and comparator: @fginter , @jmnybl , @mavela , @ammiss , how do you feel about advcl:compar and mark:comparator?

spyysalo commented 9 years ago

From a quick check with @mavela, comparator -> mark:comparator looks unproblematic, and the comparator dependencies in TDT comparative structures appear to match up with how mark is used in analogous constructions in http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives . However, compar is somewhat more challenging, and the TDT structures don't fully line up with those proposed for UD. (More to follow.)

jnivre commented 9 years ago

I fully support Chris's proposal on this point. For languages where comparative constructions are sufficiently different from other adverbial clauses, language-specific subtypes may be used. (We may consider doing this for Persian, for example.) I am looking forward to hearing what the problems are with "compar" in Finnish.

Incidentally, the difficulty of deciding whether a bare nominal modifier should still be regarded as an elliptic clause or simply as a noun phrase is reflected in the discussion about which pronoun form to use. I think this is a classic case in prescriptive grammar for both English and Swedish, where grammarians would traditionally advocate "taller than I", with the argument that it is elliptic for "taller than I am", while popular usage tends to prefer "taller than me", suggesting that "than" works as a preposition here. Moreover, there are examples like "Peter is taller than his brother", which seems to be compatible with both a reflexive reading of "his" (taller than his own brother) and a general anaphoric reading (taller than someone else's theory). According to binding theory, if "than his brother" is a clause, the reflexive reading should not be possible.

dan-zeman commented 9 years ago

Interesting. In Czech, the reflexive pronoun is a distinct word, so you would have either Petr je vyšší než jeho bratr (irreflexive) or *Petr je vyšší než svůj bratr (reflexive). Since you cannot say the latter (even if you actually mean Peter's brother), we could say that the comparative part is an elliptic clause in Czech.

(But of course, this does not necessarily mean that it is a clause also in English. One could argue that this is one of the many syntactic differences between the two languages.)

jmnybl commented 9 years ago

In TDT the both yhtä hyvä kuin Y (as good as Y) and parempi kuin Y (better than Y) structures are annotated so that the adjective (hyvä, parempi) is the head of the compar dependency rather than the first adverb (yhtä, as) which quite commonly is absent. Since it is also possible to drop the adverb part from the positive form comparision (e.g. kova kuin kivi is used in place of yhtä kova kuin kivi), we annotated the compulsory part to be the head and the optional adverb to be it's dependent.

So if we map the TDT compar onto a subtype of advcl, our analysis still does not fully line up with the UD definition. It does not feel optimal to make the part which in most cases is not present to be the primary head.

spyysalo commented 9 years ago

@jmnybl : thank you! I believe this is the current TDT structure

image

jnivre commented 9 years ago

There seems to be a difference even in English in that it is easier to view X as the head in the "as X as Y" construction than in the "more X than Y" construction. Whereas the first "as" seems optional, the "more" seems obligatory. However, the fact that "more" can also be realized morphologically does suggest that it should perhaps be treated as a function word, thus attaching the "compar/advcl" to its head instead. This would be compatible with our general treatment of function word modifiers, it would capture the essence of Chris's proposal, and it would facilitate the conversion of the Finnish treebank. These are obvious advantages. What are the disadvantages?

spyysalo commented 9 years ago

+1 for X as the head of compar/advcl instead of as/more in as X as Y and more X than Y.

spyysalo commented 9 years ago

Also, wouldn't X as the head would improve parallelism also between some English constructions, e.g.

image

image

vs.

image

image

jnivre commented 9 years ago

On the surface, yes, but you could argue that it attaches to -er (not hard-) in the second case, which makes the first pair more parallel. So it has to be coupled with the argument that "more" vs. "-er" is the type of "function-word-morphology-alternation" that we capture by never attaching anything to the function word.

spyysalo commented 9 years ago

Isn't "never attach anything to the function word" violated in these analyses?

image

image

jnivre commented 9 years ago

We don't really have a principle that says "never attach anything to the function word". The principle is rather "whenever reasonable, attach to the content word head instead of to its function word dependent". But we have already defined a number of legitimate exceptions, one of which is adverbial modifiers (things like "almost every linguist", where "almost" attaches as "advmod" to "every"). So the question is whether this is another legitimate exception or not.

spyysalo commented 9 years ago

Fair enough, I'll accept a weaker position: the minor variant that Turku is proposing (content word as head, see above) has the advantage of not requiring the addition of a further exception to the list of cases where things attach to the function word.

(I'm afraid I don't seem to be able to decide between legitimate and other exceptions; could some of the factors going into the decision maybe be documented along with the rule and its exceptions?)

manning commented 9 years ago

Sorry, I've been off busy with other stuff for a while....

I could be persuaded to take X as the head of the dependency to the comparative clause. It is certainly the minimal change that would make things consistent with Finnish. :)

If I'm honest, my reasons for being sympathetic to this analysis are that it would make things such as dependency conversion of Penn Treebank trees easier, and I suspect it would make parsing easier. Thus this was the way @mcdm and I had it in traditional SD....

Why we went with the opposite analysis is that @ngiordani and Tim Dozat felt fairly strongly that this was syntactically/semantically wrong, since (at least in English) the comparative clause is licensed by either more or the -er morpheme. This suggested extending the same analysis to as, but, as @jnivre notes, it's not really so clear for as, since you easily get that path is sharp as a razor's edge (though to my mind that does still sound kind of elliptical for as sharp as ...). At any rate, you don't get #She is intelligent than me. Well, you don't in standard English, but I googled around and it seems like you do get that sometimes in Indian English: Mahesh is Intelligent than Charan http://www.cinejosh.com/telugu-news-gossip/30145/mahesh-is-intelligent-than-charan.html; So don't worry if you think that you have a girl-friend, who is intelligent than you. http://www.forum.chillzee.in/lifestyle/fun-menu/3372-just-for-fun-an-intelligent-girl .

Nevertheless, I do basically think this argument is a sound point. It is the point of view of Huddleston and Pullum (2002), which discusses our choice of head as the "comparative governor" of the comparative phrase (p.1104). As they note and as discussed on the page http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives, this governor may not even be a modifier of the head X but a modifier of a modifier of the head, as in constructions like This may be a more serious problem than you think. If we make X the head, we do lose this link between the comparative governor and the comparative dependent.

But I do think there are arguments both ways, the other side being discussed above, and to the extent that the so-called comparative governor is often optional, it is clearly less compelling to take it as the head of the comparative dependent.

More discussion/comments and then make a decision?

jnivre commented 9 years ago

I think it is clear that the more/-er element is the head, and I don’t think we should give this up just to facilitate conversion (whether for English or for Finnish). However, the fact that there is an alternation between function words and morphology here, even in English, suggests that we could treat it in a way that is analogous for what we do with “right down the street”, where we attach “right” to “street” because it modifies “down the street”. You could argue that we have a similar structure here:

((more difficult) than you think)

And since “difficult” is the head of “more difficult”, it should also be the head of the larger phrase.

However, this implies that in “a more difficult problem than you think”, we att “than” to “difficult” (not to “problem”), because there “more” modifies “difficult” and not “problem”.

On 09 Nov 2014, at 04:16, Christopher Manning notifications@github.com<mailto:notifications@github.com> wrote:

Sorry, I've been off busy with other stuff for a while....

I could be persuaded to take X as the head of the dependency to the comparative clause. It is certainly the minimal change that would make things consistent with Finnish. :)

If I'm honest, my reasons for being sympathetic to this analysis are that it would make things such as dependency conversion of Penn Treebank trees easier, and I suspect it would make parsing easier. Thus this was the way @mcdmhttps://github.com/mcdm and I had it in traditional SD....

Why we went with the opposite analysis is that @ngiordanihttps://github.com/ngiordani and Tim Dozat felt fairly strongly that this was syntactically/semantically wrong, since (at least in English) the comparative clause is licensed by either more or the -er morpheme. This suggested extending the same analysis to as, but, as @jnivrehttps://github.com/jnivre notes, it's not really so clear for as, since you easily get that path is sharp as a razor's edge (though to my mind that does still sound kind of elliptical for as sharp as ...). At any rate, you don't get #She is intelligent than me. Well, you don't in standard English, but I googled around and it seems like you do get that sometimes in Indian English: Mahesh is Intelligent than Charan htt p://www.cinejosh.com/telugu-news-gossip/30145/mahesh-is-intelligent-than-charan.htmlhttp://www.cinejosh.com/telugu-news-gossip/30145/mahesh-is-intelligent-than-charan.h%20tml; So don't worry if you think that you have a girl-friend, who is intelligent than you. http://www.forum.chillzee.in/lifestyle/fun-menu/3372-just-for-fun-an-intelligent-girl .

Nevertheless, I do basically think this argument is a sound point. It is the point of view of Huddleston and Pullum (2002), which discusses our choice of head as the "comparative governor" of the comparative phrase (p.1104). As they note and as discussed on the page http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives, this governor may not even be a modifier of the head X but a modifier of a modifier of the head, as in constructions like This may be a more serious problem than you think. If we make X the head, we do lose this link between the comparative governor and the comparative dependent.

But I do think there are arguments both ways, the other side being discussed above, and to the extent that the so-called comparative governor is often optional, it is clearly less compelling to take it as the head of the comparative dependent.

More discussion/comments and then make a decision?

— Reply to this email directly or view it on GitHubhttps://github.com/UniversalDependencies/docs/issues/104#issuecomment-62289200.

ngiordani commented 9 years ago

Let me see if I understand Joakim's point:

@jnivre, you're saying that the 'than/as...' phrase is dependent on 'more', but since 'more' is a functional element (as evidenced by the alternation with a bound morpheme), it makes sense to attach it to the head of 'more'. This is parallel to what we do in 'right down the street'. Sound right?

I think that's reasonable. One question, though: what would we do with

Wheat raises blood sugar even more than sugar.

jnivre commented 9 years ago

Yes, that is exactly my point. For the new example, we should do the obvious thing and attach "than" to "more". This can be motivated in two different ways. Either we view "more than sugar" as elliptic for "more (rapidly) than sugar" (or something like that), in which case the attachment to "more" is a case of "function word promotion by head elision" (see general principles in the guidelines). Or we view "more" as a content word meaning "to a higher extent", in which case this case is parallel to "faster than sugar".

spyysalo commented 9 years ago

More discussion/comments and then make a decision?

I think both alternatives have been fairly presented and would welcome a decision. How to proceed?

jnivre commented 9 years ago

If you ask me, my proposal reconciles the differences between the two proposals. Both end up attaching "than" to "fun" in "more fun than I expected", but with different theoretical motivations. :)

The only potential discrepancy are cases like "a more difficult problem than I expected". Here the promotion is from "more" to "difficult" (the content word head), not to "problem" (the head of the noun phrase). What does TDT do in this case?

jmnybl commented 9 years ago

In TDT "difficult" is the head also in cases where it modifies a noun.

spyysalo commented 9 years ago

I'm personally happy to support @jnivre's motivation of the content-head proposal. (Would it perhaps work to first decide between the "as/more-head" and content-head alternatives and discuss possible remaining variants afterward?)

jnivre commented 9 years ago

Great! Then we just need to know that the Stanford/Ohio group is okay with attaching to the content word in analogy with what we do for modifiers of prepositions.

dan-zeman commented 9 years ago

Note to myself: In PDT, we treat "more" and "less" as any other adverbial modifier of the adjective (so it is not a function word) and we allow it to have dependents if necessary. It is extremely difficult to find examples of comparative constructions using them, because comparatives are mostly morphological in Czech. But I found one [cmpr9406_005.a.gz]:

tatra byla méně kvalitní vůz než jiné vozy lit. "tatra was less good car than other cars"

Adv(kvalitní, méně) AuxC(méně, než) ExD(než, vozy)

The ExD relation says that this is an elliptical construction and that a verb is missing but anticipated.

jnivre commented 9 years ago

If my understanding of the example is correct, this is in line with the original proposal from Stanford, but it could be changed to the new proposal by reattaching to the content head (in this case an adjective).

dan-zeman commented 9 years ago

Well, the transformation could be done, yes. We could also say that "less" is the content word. In fact it is itself a comparative form of the adverb málo = "little" (so we actually still have morphological gradation, though irregular in the case of málo).

Since it is so rare, I am not going to argue against the proposed solution. I just wanted to save the example for the time when I will be writing the corresponding description of the Czech data.

ngiordani commented 9 years ago

HI everyone,

I'm sorry, I failed to report the discussion we had at Stanford a few days ago -- we actually agree that Joakim's proposal works across languages. We're happy to adopt it for English! But we do think that the comparative clause should attach to 'difficult', not 'problem'.

I think we're ready to close this issue, right?

spyysalo commented 9 years ago

@ngiordani : Great, happy that we agree on this!

I think we're ready to close this issue, right?

One bit remains: the docs at http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives should be updated to reflect the decision. (Can you at Stanford take care of this?)

spyysalo commented 9 years ago

I'll draft an update of the docs.

spyysalo commented 9 years ago

I made a minimal update and revised the examples to content-head form. Cross-check would be appreciated. Also, we should probably add @jnivre's motivation of the content-head alternative to the docs.

ngiordani commented 9 years ago

Thanks, Sampo! I can finish the documentation later today.

N.

On Thu, Nov 20, 2014 at 5:59 AM, Sampo Pyysalo notifications@github.com wrote:

I made a minimal update and revised the examples to content-head form. Cross-check would be appreciated. Also, we should probably add @jnivre https://github.com/jnivre's motivation of the content-head alternative to the docs.

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/104#issuecomment-63811317 .

spyysalo commented 9 years ago

@ngiordani : are you likely to modify this part of the documentation anytime soon? I thought I might add some examples from Finnish if not.

ngiordani commented 9 years ago

Sorry @spyysalo, dropped the ball on this! It's done now.

spyysalo commented 9 years ago

Great, thanks, I'll close this and add the Finnish examples to the revised docs.

ngiordani commented 9 years ago

Hm, reopening because I just thought of something else... shouldn't the dependent clause now be acl? Thoughts, @spyysalo, @manning, @mcdm, @jnivre, @tdozat?

ngiordani commented 9 years ago

(Note: what I meant is, should it be acl when the head is a noun. As in "more flour than necessary".)

mcdm commented 9 years ago

Do you mean that in "more sausages than you bought last week", we would get "acl", but in "more important than you thought last week" it would be "advcl"?

ngiordani commented 9 years ago

Yeah, that's what I mean. Since acl is supposed to modify nominals...

On Mon, Dec 1, 2014 at 6:48 PM, mcdm notifications@github.com wrote:

Do you mean that in "more sausages than you bought last week", we would get "acl", but in "more important than you thought last week" it would be "advcl"?

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/104#issuecomment-65175892 .

spyysalo commented 9 years ago

(reopening in tracker)

spyysalo commented 9 years ago

@ngiordani @mcdm : are there still open questions here or can this be closed?

ngiordani commented 9 years ago

Well, I'm assuming we'll differentiate between acl and advcl in comparatives. So far no one's complained. Maybe @manning or @jnivre can give a nod here?

On Fri, Dec 12, 2014 at 12:02 AM, Sampo Pyysalo notifications@github.com wrote:

@ngiordani https://github.com/ngiordani @mcdm https://github.com/mcdm : are there still open questions here or can this be closed?

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/104#issuecomment-66742612 .

jnivre commented 9 years ago

Nod

J

On 12 Dec 2014, at 20:39, ngiordani notifications@github.com<mailto:notifications@github.com> wrote:

Well, I'm assuming we'll differentiate between acl and advcl in comparatives. So far no one's complained. Maybe @manning or @jnivre can give a nod here?

On Fri, Dec 12, 2014 at 12:02 AM, Sampo Pyysalo notifications@github.com<mailto:notifications@github.com> wrote:

@ngiordani https://github.com/ngiordani @mcdm https://github.com/mcdm : are there still open questions here or can this be closed?

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/104#issuecomment-66742612 .

— Reply to this email directly or view it on GitHubhttps://github.com/UniversalDependencies/docs/issues/104#issuecomment-66823647.

manning commented 9 years ago

I'm okay with this. But I do just want to raise a possible alternative analysis, now that I've thought about it for a little.

It's not clear that the 'more sausages' case is parallel to 'more important'. Because in the former case, there is no bound morpheme comparative alternative, and I doubt there are languages that don't have a word for "more" or "large amount" in this case. So, should we possibly in this case have the dependent comparative clause be a dependent of "more", and then it would still (consistently) be a advcl. It would be like seeing "more" as elliptical for "more numerous sausages".

ngiordani commented 9 years ago

Actually, I think this new proposal makes a lot of sense. @jnivre, are you on board?

On Sat, Dec 13, 2014 at 2:22 PM, Christopher Manning < notifications@github.com> wrote:

I'm okay with this. But I do just want to raise a possible alternative analysis, now that I've thought about it for a little.

It's not clear that the 'more sausages' case is parallel to 'more important'. Because in the former case, there is no bound morpheme comparative alternative, and I doubt there are languages that don't have a word for "more" or "large amount" in this case. So, should we possibly in this case have the dependent comparative clause be a dependent of "more", and then it would still (consistently) be a advcl. It would be like seeing "more" as elliptical for "more numerous sausages".

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/104#issuecomment-66893919 .

jnivre commented 9 years ago

Sure. If I understand correctly then, the comparative clause always attaches as advcl to an adjective or adverb, never to a noun, and it attaches to "more" if there is no explicit adjective or adverb. Thus:

more difficult than you think advcl(difficult, think) harder than you think advcl(harder, think) more rapidly than you think advcl(rapidly, think) a more difficult problem than you think advcl(difficult, think) more problems than you think advcl(more, think)

Is that correct?

manning commented 9 years ago

Yes correct. I'll put that in the documentation. And call it closed.

dan-zeman commented 9 years ago

Thanks for the comprehensive documentation (http://universaldependencies.github.io/docs/u/overview/specific-syntax.html#comparatives) and sorry if it turns out that I did not read it attentively enough but I could not find a solution (or analogy) to this:

[1] Home prices have more than doubled in the past decade.

Attaching doubled to more does not seem quite right to me, because doubled contains both the action modified by quantity/degree, and the base quantity for the comparison. A paraphrase easier to analyze would be

[2] Home prices have increased more than twice in the past decade.

where I guess we would want advmod(increased, more) advmod(more, twice) [or advcl???] mark(twice, than)

In order to make the two above examples somewhat parallel, I am inclined to analyze the former as advmod(doubled, more) mark(more, than)

with the assumption that the quantity compared to has been elided (although it actually has been incorporated into the verb).

The actual example from the data that led me to think about this was a bit different:

[3] more than thirty-years-lasting experience (cs: více než třicetileté zkušenosti)

(It was in Czech and thirty-years-lasting is one word, and it is an adjective.) I am not sure whether [3] will have the same solution as [1] though. If we paraphrase it as

[4] older than thirty-years-lasting experience

then we have amod(experience, older/more) advcl(older/more, thirty-years-lasting) mark(thirty-years-lasting, than)

What do people think?