UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

"double the price" etc. #510

Open nschneid opened 4 months ago

nschneid commented 4 months ago

I think in "received double the price", "double" is obj, and "the price" is a modifier of some kind, perhaps nmod:npmod is the best option. My reasoning is that you can drop "the price" and reconstruct it contextually with no change in meaning, but if you drop "double" you get a totally different reading:

  • We received double the price
  • We received double (=of the price)
  • We received the price (totally different reading)

Interrogative test:

  • What did you receive?
  • Double the price
  • Double (same meaning)
  • The price (not the same meaning)
  • That (antecedent: "double the price" not "the price")

Originally posted by @amir-zeldes in https://github.com/UniversalDependencies/docs/issues/717#issuecomment-1961487785

nschneid commented 4 months ago

My strong intuition is that "double" is a dependent within the nominal, not the head. It might be called a predeterminer, or just an adverb that precedes determiners. "I paid double" feels like ellipsis, along the lines of "I bought some."

I still don't really understand the PTB's stance on these; for "quite" and "half" (#412) they say PDT but for "double" and "triple" they say RB. "one-half" is also RB (cf. #162).

image

Looking at the bracketing guidelines (p. 200), we see pre-determiner QPs:

image
nschneid commented 4 months ago

Re: @amir-zeldes's tests above, I think there's an idiomaticity thing where it's odd to say "received the price" (unqualified) where you're using the price metonymically for the amount of money transferred, and you would instead say "received the asking price" or "received the price that I was hoping for" or similar. "two times the price" and "double the price" are other kinds of qualifiers that work.

amir-zeldes commented 4 months ago

I still don't really understand the PTB's stance on these; for "quite" and "half" (https://github.com/UniversalDependencies/UD_English-EWT/issues/412) they say PDT but for "double" and "triple" they say RB

Isn't that the same as my suggestion above?

it's odd to say "received the price" (unqualified) where you're using the price metonymically for the amount of money transferred

I think it's fine to say "I paid the price", and in any case "asking price" and "price that I..." are just adjuncts, so they shouldn't alter the analysis. "Double the price" has the same denotation as "double" but not the same denotation as "the price", so I think it supports head status pretty strongly. FWIW, the equivalent construction in many languages shows clear modification status:

I'm sure there are also languages with the opposite pattern, but it's not odd to be in this pattern IMO.

"two times the price" and "double the price" are other kinds of qualifiers that work.

I'm not sure if "two times" needs to be relevant to "double", but I think if anything "two times the price" shows that it is not a PDT, since it has its own nummod, which would be odd on any existing recognized PDT.

nschneid commented 4 months ago

"two times the price": Yes, it would be weird to say that a PDT has a numeric modifier, because most determiners don't have numeric modifiers. I think my problem with the PTB tag analysis is that I don't believe predeterminer is a POS category; I think it's a function (a special kind of det relation). To me, "all" is a determiner (CGEL would say "determinative") that can serve in predeterminer function, "quite" is an adverb that can serve in predeterminer function, "times" is a noun that can serve in predeterminer function (with or without its own numeric modifier), and so on.

"paid double the price": The meanings of "pay" and "price" are tightly coupled in a semantic frame, and we get into scalar implicature issues when asking whether "pay double the price" entails "pay the price", so let's use examples with count nouns. Some of these entailments hold straightforwardly, others require context, and one is not entailed:

These strike me as three different varieties of quantity modification, which fits the QP approach in PTB (technically PTB doesn't specify heads but I don't imagine the tree above is supposed to be read as an NP headed by a QP).

nschneid commented 4 months ago

[UPDATED] Also interesting to note that "twenty" and "half" are more flexible than "double" and "twice" in permitting "of" (with appropriate semantics):

All of them but "twenty" allow "the number of" to be inserted:

amir-zeldes commented 4 months ago

All of this is fine, but there is a fundamental difference between numbers and double, in that "I ordered the bagels" can mean the same as "twenty bagels", but "I order double the bagels" has to be distinct from "the bagels" in context (because its only interpretable by comparison to the base amount of bagels). I think "double" basically has an argument here, the thing that is double, which is a separate referring expression, and also a separate nested NP. This is differ from nummods, which do not license a separate referring expression, and are part of the same NP without an additional head.

nschneid commented 4 months ago

Semantically I agree with you that "double the bagels" is closer to "twenty of the bagels" than to "twenty bagels", because two different sets of bagels are involved. I don't see that as a criterion for syntactic headedness though. "Another bagel" and "fewer bagels" also evoke multiple sets. By contrast, "a lot of bagels" and "a doozy of a bagel" only have one referent despite there being two NPs. Whether "of" is there or not is crucial for determining the structure—it doesn't get to be implicit like in Hebrew. :)

Structurally, English really doesn't like putting random non-prepositionally-marked NPs as post-head modifiers of nouns (outside of names and dates, time adjuncts are the main exception, but there the modifiers, not the head, are semantically restricted). Whereas there is a clear pattern of quantity premodification of common nouns, including several kinds of quantity/measurement predeterminers. See CGEL's section on Predeterminer Modifiers (the relevant subcategory being Multipliers, p. 434).

alexhsu-nlp commented 4 months ago

@amir-zeldes @nschneid

To add a very tiny and probably digressed footnote, for me in Japanese "値段の2倍" and "2倍の値段" do not sound like exact equivalents. They have a slight difference in emphasis, which may not be important, but they seem to differ also syntactically. Compare the following (I am not even a fluent speaker, so please view these under scrutiny, although Google search results support my feelings):

⭕あの価格より2倍の値段 ❌あの価格より値段の2倍値段のあの価格より2倍価格の2倍より高い (lit. "more expensive than double of the price") ❓ 2倍の価格より高い

The semantic difference seems even greater in Mandarin Chinese (sorry that even as a native I do not have enough knowledge to judge whether there is a syntactical usage difference).

Also interesting to note that "twenty" and "half" are more flexible than "double" and "twice" in permitting "of" (with appropriate semantics):

  • I ordered twenty/half (of) the bagels.
  • *I ordered double/twice of the bagels.

All of them allow "the number of" to be inserted:

  • I ordered twenty/half/double/twice the number of bagels.

For me "I ordered twenty the number of bagels" does not make sense; I can only imagine the usage of apposition with a comma added right after "twenty" (although it still sounds a bit forced, and the meaning is distorted).

20240306

For this PTB image, I note that they will identify "half" and "some" before the ellipsis to be NNs. Does it mean that PTB also identifies the "double" that participates in the ellipsis formation to be an NN?

nschneid commented 4 months ago

Wait did I write that "twenty the bagels" and "twenty the number of bagels" were OK? I must have been distracted when I wrote that haha. Updated the post above. Thanks for pointing it out @alexhsu-nlp!

amir-zeldes commented 4 months ago

do not sound like exact equivalents

Thanks for the examples - that makes sense, and in general I would expect based on a 'principle of no synonymy' that if both orders are possible in a language, there would be some minimal difference at least. But I think in cases where the thing counted is governed via a possessive construction, "double" as a possessor must be the head, even if semantically it is construed as non-referential. For English I think the case is doubly (ha!) strong, because it is both syntactically non-omittable and semantically referential.

Does it mean that PTB also identifies the "double" that participates in the ellipsis formation to be an NN?

It's hard to be sure, but yes, I read that to mean that "double" could also be tagged NN if it governed and "of" PP. This is easy to agree to in lexicalized contexts like double meaning Doppelgänger:

But it's a bit less clear for things like:

nschneid commented 4 months ago

The PTB excerpt shows "twice/RB the amount" and "double/RB the amount" alongside "one-half/RB" the amount. It's odd to me that they distinguish that from "half/PDT the time", but again, I think the PDT category is suspect to begin with. I wouldn't mind calling all of those adverbs. "Half" and "double" can also be nouns, of course, including in "12 is the double of 6" (Merriam-Webster). M-W has "double the number" as an adjective for some reason.

nschneid commented 4 months ago

Lori says that "amount" is definitely the head in "double the amount", though in other languages it might be the reverse. Inserting "of" would be weird, though after some other predeterminers it can be inserted and changes the structure: "all the books" vs. "all of the books". She raised the analogy of fractions in Chinese, where the numerator is the head, as opposed to English where the denominator is the head ("two thirds").

amir-zeldes commented 4 months ago

Lori says that "amount" is definitely the head in "double the amount"

Did she share what the argument for that is? So far I have "double" as the head on account of omittability and heading a distinct referring expression ("double" heads "double the amount", "amount" heads "the amount"). Note that either is pronominalizable separately: "give me double that" where "that=the amount" vs. "give me that" where "that=double the amount". I can't think of a movement test that applies, but omission and pronominalization seem like relevant syntactic tests.

nschneid commented 4 months ago

Don't those arguments hold for "both" as a predeterminer?

amir-zeldes commented 4 months ago

I don't think they hold for "all". Oh wait, this says "both"... Not that I feel strongly about both/PDT, but I'd say one notable difference is the plural agreement on the noun, which would be atypical of government:

That could be attributed to semantics, but "double" has no such restriction. I assume we agree that in "both of the children", the syntactic head is "both"?

As for pronominalization, I don't think you can do the same thing as with double:

Additionally, I don't think it introduces two referring expressions - they is only a set of two children. It's not that "both the children" implies another superset of children, out of which two received an additional referring expression.

Oh, and omission is different too: if you omit "double" you get a different denotation. This is not true for both:

nschneid commented 4 months ago

For the comparison with "both" we need to consider "double" with count plurals (set-items):

(I think what's going on here is that in "double the books", there's a metonymy of a set for an amount—strictly speaking "the books" is short for "the amount of books". It is usually the books—it would be a stretch to say "double those books" meaning "twice as many books as those". And "books" has to be plural.)

With respect to mass nouns, "double" is closer to "all":

"That" is pretty vague and can work for an amount, so "I spent all/double that" is fine.

In terms of syntactic omissibility of the quantified thing, "both" is a clearer analogy than "all" (I think it sounds old-fashioned in some contexts to shorten "all of them" to just "all" but I don't quite know why).

Certainly the semantics of "double" has something interesting going on in terms of expressing a new set whose size is twice that of an original set. But that's a problem for compositional semanticists. :)

I'm not saying "double" is 100% equivalent to predeterminer "both"/"all" in all syntactic respects, but it is pretty darn close. Whereas I can't think of a remotely similar case where it is well-established that we have a post-modifying NP that is not a temporal/size/extent modifier.

nschneid commented 4 months ago

Oh, and I bet there are arguments that can be made based on blocking other determiners. If "double" was the head of "the books" you might get things like "the/all double the books".

alexhsu-nlp commented 4 months ago

For the comparison with "both" we need to consider "double" with count plurals (set-items):

  • I bought double the books.
  • *I bought double those/them. (pronominalization)
  • *I bought double the book. (agreement)

For me,

To add, "all" and "both" seem both evoke the same set of entities for me, rather than a different set as in "half" and "double" or a specific integer.

By the way, how do you feel the following sentence (which I feel awkward):

Compare with the following online: source

Also consider their ellipsis forms (seems okay providing contexts):

alexhsu-nlp commented 4 months ago

She raised the analogy of fractions in Chinese, where the numerator is the head, as opposed to English where the denominator is the head ("two thirds").

I would also like to see an argument if there is one searchable :)

image

I randomly pick a sentence here and indeed the numerator (一, id=5 directly governing 分) is the head of fraction (三分之一, "one third") in the current scheme.

However, I am slightly surprised that "分之" has been further analyzed; unless I am reading classical Chinese or deliberately aware, I almost always consider this as a whole inseparable idiom-like chunk similar to the "差不多" in the same sentence; I would definitely separate either both or neither if only by intuition.

P.S. I asked another Mandarin native just now, and he said that he prefers "三分|之|一" if a word of 2 characters has to remain, and the division of it into 4 distinct characters is perfectly natural. Aha, maybe I am a fake Chinese.

Current update of votes (including me):

amir-zeldes commented 4 months ago

I bought double the book. (agreement) ... "the books" is short for "the amount of books"

Doesn't that support my argument that there is no agreement requirement, unlike "both" and in accordance with what government would lead you to expect? You can say "double the amount" (morphosyntactically singular), so there is no grammatical agreement constraint. "Double the book" only doesn't work for purely semantic reasons, not due to morphlogical (dis-)agreement.

With respect to mass nouns, "double" is closer to "all" ... money

No, I don't think so - for the reasons I listed above: "double the money" does not allow you to omit "double" with the same meaning, but "all the money" = "the money". The pronominalization argument holds here too, at least for "double that" (that=the money)

I can't think of a remotely similar case where it is well-established that we have a post-modifying NP that is not a temporal/size/extent modifier

Wouldn't it be the same issue for "three times the price", etc.? Would we want the noun "time" to be considered a determiner then? I don't think it's something special about "double" really, it's just a construction that specifies a multiplicative quantity and has an optional argument of the thing being multiplied. In a sense that can be seen as a kind of extent modifier, and I suppose I would also choose obl:npmod for it (if we annotate double as ADJ)

*double those

I don't think it's common, but it is attested, e.g.:

差不多, 三分之一

Thanks for sharing those! I am in no way a Chinese native speaker, but it seems to me the question here is how 'etymologizing' we want to be. We can analyze the internal structure of 差不多 and then I think it's pretty clearly three units, but there is no ability to change any part of it, so it's a fixed expression. For 三分之一, I think an etymological interpretation is probably 三分|之|一, but because the first number can change and 分之 is frozen, I can see how synchronically people might choose a different segmentation. This tension comes up in English and other UD languages as well.

alexhsu-nlp commented 4 months ago

But I think in cases where the thing counted is governed via a possessive construction, "double" as a possessor must be the head, even if semantically it is construed as non-referential.

Actually, this is the first time I have heard the treatment of "double of its price" as a possessive construction like "a photo of my friends" (if I understand it correctly). Most introductory generative syntax books seem to avoid deep discussion in quantifiers. My poor linguistic knowledge :(

alexhsu-nlp commented 4 months ago

We can analyze the internal structure of 差不多 and then I think it's pretty clearly three units, but there is no ability to change any part of it, so it's a fixed expression.

As an adverb, yes it is fixed, although colloquially, many attach a second 有 ending up "差不多三分之一的美国人护照". If it is used as a predicate then adjunct-like components can be added (e.g., 差不太多).

amir-zeldes commented 4 months ago

If it is used as a predicate then adjunct-like components can be added (e.g., 差不太多)

Interesting, I hadn't seen that yet - I think this is a pretty good argument to analyze the internal structure. We have similar rare modifiers of otherwise fixed expressions in English which we've talked about, for example "due to", which rarely has variants like "due in (large) part to". I guess you have to live with exceptions or decide to analyze them throughout...

nschneid commented 4 months ago

I can't think of a remotely similar case where it is well-established that we have a post-modifying NP that is not a temporal/size/extent modifier

Wouldn't it be the same issue for "three times the price", etc.? Would we want the noun "time" to be considered a determiner then?

Yes, the multiplicative construction "X times the N" looks very similar to the multiplicative "double/triple the N". Have we discussed it? My hunch would be to treat "times" as in predeterminer position as well (and tag it as a noun). EWT seems to call it nmod:npmod with "times" as the dependent, but arguably that could be det:predet as it's the slot before the main determiner. "X times Y" also occurs with comparative adjectives/adverbs, e.g. "3 times higher", which EWT has as obl:npmod. In both cases, the multiplier phrase is the modifier unless separated from the quantified-thing by a preposition.

At a high level I am expressing skepticism about saying multiplicative NPs have a completely different structure from all other NPs, even though they are semantically weird, when predeterminer position is attested more broadly ("all", "both", etc.).

We agree that "many a N" is a predeterminer construction right? Doesn't that have a similar syntax-semantics incongruities as the multiplicatives? (*I ate many a cookie, and it was delicious.)

amir-zeldes commented 4 months ago

We agree that "many a N" is a predeterminer construction right

Yes, I don't have any issues with that

My hunch would be to treat "times" as in predeterminer position as well (and tag it as a noun)

I think that would be a much bigger deviation from current practices than assuming that "double" governs the multiplicandum. Would you really want an inflected plural noun to be deprel det:predet in this one specific construction, and also tag it as NOUN at the same time? It would also be unique a determiner that can have nummod. That seems worse to me.

"X times Y" also occurs with comparative adjectives/adverbs, e.g. "3 times higher", which EWT has as obl:npmod

I agree with that too, that's fine (you can omit "3 times" and get the same sense, you can't omit "higher", as expected for a modifier)

multiplicative NPs have a completely different structure from all other NPs

I don't find it so extreme really, it's just a relational structure with zero preposition instead of "of", so :npmod. We have a variety of NPs with "of-less" modifiers where "of" would be conceivable, for example years in dates: "October 2018" is the same as "October of 2018", and I don't know what to call that relationship without "of", but I still think "October" is the head (it's a kind of month, not year, and there are two referring expressions in there; only the year can be omitted while retaining the denotation in context). We've already established that "of" is possible with "double" ("the 1st number is double of the 2nd"), so I don't see it as a sui generis that there is an abbreviated "of"-like relation here too.

nschneid commented 4 months ago

"the 1st number is double of the 2nd"

That's actually ungrammatical for me. If somebody wrote that in a draft I'd think it was an error. Not saying nobody says it, but I think you'll find a lot of native speakers don't like it. Lori didn't, and CGEL doesn't—p. 434 below.

image

But even if "double of" were well-established, I don't see how that clarifies anything in the predeterminer use, because "all" can head a partitive, and the structure is different if "of" is present vs. absent. We say "all" is the head in "all of the books", but not "all the books".

The structure of dates is idiosyncratic and disputed so I don't think that should be a factor here.

If you object to a NOUN attaching as det:predet, then I can live with the current approach for "times", which is attaching as nmod:npmod—but that's only because of the quirk that UD mixes category and function information; spiritually I think it's the same predeterminer slot, and it can be filled by certain nouns ("times"), adverbs ("twice", "quite"), adjectives ("many"), as well as determiners ("all", "both"). Crucially, though, "times" is not the head of the quantified noun.

Finally, I don't understand your pronominalization and agreement arguments attempting to distinguish "double" from "all" as a predeterminer. They look the same to me—

Pronominalization:

Agreement:

Logically speaking, the singular "problem" might be expected to be compatible with "double" (if you are construing "double" as a function that multiples what comes next by 2). But this fails: just like "all" and "both", the construction requires a plural (with countable items as opposed to a holistic-quantity noun like "amount").

alexhsu-nlp commented 4 months ago
  • The 1st number is double of the 2nd

After some quick search, I tend to believe this example is actually from a made-up question for secondary schools, high schools, or some social exams. Hints show that this example is from India (is it possible that there is a dialectal difference?). Furthermore, most of the easy-to-be-searched examples with the same structure of usage come from simple math questions.

In net examples, the appearance "of" usually accompanies the nominalization of "double" signaled by a further determiner, making it "a/the double of the/PossessivePron ...".

amir-zeldes commented 4 months ago

even if "double of" were well-established, I don't see how that clarifies anything in the predeterminer use, because "all" can head a partitive, and the structure is different if "of" is present vs. absent. We say "all" is the head in "all of the books", but not "all the books".

I agree, this is definitely not a deciding factor in how to analyze the structure without "of", I just meant it gives an intuition that a case where "double" is the head is not so strange to consider.

The structure of dates is idiosyncratic and disputed so I don't think that should be a factor here.

I hadn't realized this is disputed - does anyone think "June 2018" is headed by "2018"? If so I'd be curious to know why, it doesn't look controversial to me. The only reason I brought it up is because you said letting "double" be a head would mean that multiplicative NPs would have a completely different structure from all other NPs, and these dates are another case with a similar unmarked modifier. I agree dates are idiosyncratic, but so is this "double" thing, so maybe they share the same idiosyncracy?

I think it's the same predeterminer slot, and it can be filled by certain nouns ("times"), adverbs ("twice", "quite"), adjectives ("many"), as well as determiners ("all", "both")

I think the morphosyntax does matter here, otherwise you could also say that genitive 's possessors also occupy the determiner slot and should be labeled det, because:

But the "Kim's" phrase has a very different internal structure and expansion possibilities, just like "times" can take nummod etc. I don't think a lot of people would see NPs like that as candidates for a determiner.

I don't understand your pronominalization and agreement arguments

The latter must be a misunderstanding, I'm not making an agreement argument - I thought you were making one for this being the same as "all", since you noted this example:

I just wanted to point out that "double" doesn't interact with morphological agreement (actually if it did, that would be a problem for a government analysis).

The pronominalization argument does hold for me, as one of the basic syntactic tests (omission, pronominalization, interrogation, movement). Since "double" cannot be omitted, and the whole phrase can be pronominalized with the same meaning, but pronominalizing just the thing doubled does not work, double passes those head tests. The only test I don't have is movement, which I don't think applies here (to either "double" or "price"). The word "all" is different, since it does not pass the pronominalization test:

There is no distinct pronominal reading for "all the money" or "the money" inside "all the money". This is not true about "double" - (Context: I have $20 and they originally asked for $20, but now they want $40):

I just don't know any other good tests for headedness except for omission, pronominalization/interrogration (these are usually the same) and movement, and double passes the first two. If you have a good idea for movement or other tests please add those!

nschneid commented 4 months ago

Dates: I could see an argument for flat. Not trying to argue for that but I just don't see "June 2018", which realizes a very specific pattern for a specific kind of entity, as a good analogy for determining the structure of "double the books".

There is no distinct pronominal reading for "all the money" or "the money" inside "all the money".

Yes, but isn't this for semantic reasons—because "all" expresses totality (100%, so there's only one set involved), whereas "double" or "twice" or "two times" expresses 200% and "half" expresses 50%? (I assume we're putting "half the cheese/books" in the same category as "double the cheese/books".)

For the omission test we can consider both directions—

Omission of the quantifier: If you are including set equivalence as part of this test then that distinguishes the items that express totality as in the pronominalization test. Again, I'm not sure that the syntactic test should be so stringent in terms of semantic equivalence.

Omission of the quantified item: Considering "all", "both", and "double", all of them can be used without the quantified item under certain circumstances. I think it's ellipsis, along the lines of ellipsis with a nummod or percentage. The availability of ellipsis differs somewhat, apparently influenced by both syntax and semantics. But given the right conditions, ellipsis is very natural with quantifiers generally:

Turning to contexts that support multiplier semantics:

"I bought both in Apple" is an ellipsis where "both" is promoted to head. "I bought double in Apple" looks awfully similar to me—I wouldn't want to say that it's not ellipsis, or it's a different kind of ellipsis.

Incidentally, if I say "that is a double/triple threat", "double" and "triple" are acting as amods right? Which supports a generalization that they are premodifiers (though this usage has different semantics—multiple parts/aspects of something).