UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
200 stars 42 forks source link

"a few", "a little" #170

Closed nschneid closed 3 years ago

nschneid commented 3 years ago

I would like to discuss the analysis of "a few" (and later "a little").

Typically it modifies a plural noun, in which case the current analysis almost across the board in EWT and GUM is to attach "a" and "few" separately to the head noun, with det and amod respectively.

I'm not sure this is the best analysis, however: consider that "a few" can stand alone for the NP, similar to "some" or "several":

"A few" can also be coordinated with quantifiers/quantities:

It can be followed by an "of"-PP:

And it can modify things other than plural nouns:

The "a" cannot be omitted or replaced with "the" and receive the same interpretation, nor can a possessive + few receive that interpretation:

Taken together, the expression functions a lot like determinative "many".

"A little" behaves very similarly except for mass rather than plural nouns:

Would it be better to analyze these as fixed(a, few), fixed(a, little)?

Here are the exceptions to the typical analysis in the corpus (including a couple of errors):

Note that there are other quantificational expressions with "a" + NOUN—a couple, a bit, a lot, a bunch. These already form a constituent with the article attaching to the noun as det.

Internal modification: some of these expressions allow intensification, but these may be lexicalized expressions as well. The ADJ ones seem less flexible than some of the NOUN ones:

a + ADJ

a + NOUN

nschneid commented 3 years ago

@amir-zeldes any opinions?

amir-zeldes commented 3 years ago

Well, I agree that from a historical perspective the current analysis is a bit simplistic, but I think it is mainly motivated by a "no better solution" situation: in many ways, it's like "a lot of" but without the "of", so you can't do a normal nmod with "few" as the head. For "a few of the apples" I think the analysis should parallel "a lot of".

The fact that "a few" can stand by itself is true, but that doesn't mean it has to have the same analysis. Semantically it is pretty similar to "many", as you say, or also "some" as a quantity expression.

I think I could see doing "a <-det few" instead of attaching to the noun, but I would be opposed to fixed since there is definitely the possibility of modification, here's a random example:

And in general, I would say it's relatively clear "few" outranks "a" as a possible head, so it's not really 'structureless'. The only real question is 'who does the "a" belong to'.

The main reasons I would hesitate to change this are:

So I guess it's mainly a question of whether that would make the analysis much better, and I'm not super sure about that. I guess it would be nmod:npmod to the noun, right? Or what were you thinking for the function of "a few" with respect to the plural noun?

nschneid commented 3 years ago

On the view that it's been grammaticalized a compound determiner, it could be det with respect to the noun.

But mainly I just want it to be a constituent. Even with rare internal modification (not sure if "a scant few" is a frozen expression or not), consider:

But article+adjective with an elided noun is not possible in general:

So I think this is a good argument for making a+few into a constituent. As is, "a" is attached to "few" only when there is no head noun, so the benefit of changing it would be consistency.

nschneid commented 3 years ago

It occurs to me that this is similar to article+number expressions like "a hundred", which also permit plural noun ellipsis: I bought a hundred (books).

GUM treats "a hundred" as a constituent (det(hundred, a), which "hundred" attaching as nummod). So treating "a few" and "a little" as constituents would be similar to that.

amir-zeldes commented 3 years ago

Yes, "a hundred" is pretty natural to treat that way since we need to deal with "one hundred" as well, which has a similar structure. It also doesn't create any friction in terms of labeling 100, since that's still nummod. But that brings me back to my earlier questions: what do you want the deprel of "few" to be?

My main hesitation in touching this (aside from it being work) is that I find nmod:npmod on "few" less elegant than the current amod , and det on amod feels kind of wrong. With the current analysis it's not exactly perfect, but I get what it's saying and the labels are rather tame looking.

nschneid commented 3 years ago

I agree det modifying an adjective feels kind of wrong, but that's what we are currently forced to do if the head noun is omitted. The fixed analysis makes the most sense to me—i.e. capturing that "a few" and "a little" are two syntactically very special multiword expressions combining what was historically a determiner and an adjective. ("A lot" and friends are semantically special but not as much syntactically because "lot" is historically a noun.)

amir-zeldes commented 3 years ago

det on an adjective with no noun is the normal way of handling 'promotion' (or more traditionally, nominalization of the adjective), so I don't find it bad. It's the same as "the poor", or "the following". My concern was having something with deprel amod having the det, which this would introduce in this construction.

I think I feel pretty strongly against fixed for a number of reasons:

As I said above, I think if anything nmod:npmod makes the most sense if we want "a few" as a constituent, since it would be analyzed as a full modifier NP indicating extent without a preposition. But I don't think that's really right, and honestly this just makes me want to stick to amod and leaving "a" as det to the noun in one of those "UD just likes fountainy graphs" moments (there are quite a few of these murky fountains, like "not only" (not doesn't modify only in EWT), or "to" attaching to a predicate noun/adj in things like "to be good". It's not 100% right, but it makes life a bit simpler and I've just come to accept it in the name of lexicocentrism I guess.

nschneid commented 3 years ago

det on an adjective with no noun is the normal way of handling 'promotion' (or more traditionally, nominalization of the adjective), so I don't find it bad. It's the same as "the poor", or "the following". My concern was having something with deprel amod having the det, which this would introduce in this construction.

I think I feel pretty strongly against fixed for a number of reasons:

  • Whatever we think "scant" is, the construction is modifiable, so you'd be introducing potentially quite a few "fixed" expressions to an already long list

Here's CGEL (p. 392):

image

Really "a few" and "a little" are special constructions that permit very limited internal modification. Maybe UD needs an almost-fixed relation. ;)

  • If fixed is really only used for things where headedness is meaningless, I don't think this is such a case: I'm pretty sure "few" outranks "a"

I don't have that intuition. I think they act in concert.

  • It obscures the fact that "a" is actually fulfilling a fairly normal role here, and erases the parallelism with "few" without an article. Currently we have "few dogs" and "a few dogs" being structurally very similar (both have an adjectival modification indicating they are few).

But if you think the indefinite article is acting as usual then it's very odd that it's modifying a plural noun, yes? You can't say "a several dogs" or "a many dogs". "A few" is simply a syntactically special expression.

amir-zeldes commented 3 years ago

I agree it's a special construction, but there are lots of special constructions and we only have very few labels... The same kind of a+plural appears in a number of places, for example "a great many" etc., so it's not quite unique.

In any case, it sounds like we agree that it's not quite or only almost fixed, so I think we should take fixed off the table. If that makes sense, then my question remains, what would the deprel be? I think if we change it, few should be the head and "a" should be a dependent as it usually is.

nschneid commented 3 years ago

flat?

The flat relation is one of three relations for multiword expressions multiword expressions (MWEs) in UD (the other two being fixed and compound). It is used for exocentric (headless) semi-fixed MWEs like names (Hillary Rodham Clinton) and dates (24 December). It contrasts with fixed, which applies to completely fixed grammaticized (function word-like) MWEs (like in spite of), and with compound, which applies to endocentric (headed) MWEs (like apple pie).

I would say the definition fits because we are talking about a headless semi-fixed MWE. It is more grammaticized than names but there's nothing saying that grammaticized expressions cannot be flat, only that the completely frozen ones are fixed.

"a good/great many": these are also MWEs acting as quantifiers with special morphosyntax. a many books, a very/considerable many books, *a great several books. CGEL p. 394:

image

amir-zeldes commented 3 years ago

I'd be against flat for the same reasons as fixed. It's not really similar to any of the current uses of flat in English, and I really don't see a reason not to treat "a" as a determiner here. "a" generally doesn't have dependents and typically modifies both nouns and nominalized adjectives, so it fits perfectly as det. If I wanted "a few" to be a constituent (i.e. dependency chain), I think it would be most parallel to the extent modifiers found with spatio-temporals and other quantities. Compare a possible:

To existing analyses like:

I'm still not sure it's a huge improvement over what we have now, but the idea of "a" heading a flat expression seems wrong when it's basically just a determiner and looks exactly the same as in the independent "a few" functioning as an argument (I assume you wouldn't make that be flat either way, right?)

nschneid commented 3 years ago

So I guess the crux of the question is whether it is a nominalized adjective like "the poor" or "the British". I don't see it that way—yes you could say "the many as opposed to the few" just like "the rich as opposed to the poor", but this strikes me as an entirely different usage from "a few" as in "I bought a few". And note that while you can say "the many as opposed to the few" you cannot say "a many books", so there is something special about "a few" and "a great many".

Can the normal nominalized adjective construction be used with an indefinite article? Here are results with indefinite article + ADJ other than "few" or "little", and they look like annotation errors (attachment errors or lexemes with distinct nominal senses that I'd tag as NOUN, e.g. "a contemporary" = 'a person who is around at the same time as someone'), not a productive construction where an adjective is coerced to a nominal.

nschneid commented 3 years ago

I'm still not sure it's a huge improvement over what we have now, but the idea of "a" heading a flat expression seems wrong when it's basically just a determiner and looks exactly the same as in the independent "a few" functioning as an argument (I assume you wouldn't make that be flat either way, right?)

"I bought a few"—I'd want that to look similar to "I bought many". This could be achieved with flat + an ExtPos feature. Remember that flat is really asserting that there is no head syntactically—"a" and "few" are on equal footing—there's only a head in the tree for data structure convenience.

amir-zeldes commented 3 years ago

I'd want that to look similar to "I bought many"

Yes, but I'd like "I bought a few" to look similar to "I bought few"...

nschneid commented 3 years ago

Yeah that would be achieved too with flat and ExtPos=ADJ.

amir-zeldes commented 3 years ago

I'm not sure I understood you- are you saying "a few" by itself should also be flat? If so, what about "the few" in "the few I know"? And what would you do about things like:

I think keeping "a" as det to something makes the most sense given the flexibility and potential internal modifications of few.

nschneid commented 3 years ago

Oooh fun. "A long few days"—hadn't thought about this. I think it's yet another construction ("a" + adjective quality modifier(s) + quantifier): "a tough several days", "an unprecedented very long 8 days". So I would say that in that case "few" is acting as a normal quantifier-adjective, and this construction licenses a special use of "a" but I'm not sure whether to say the quantity adjective itself licenses "a". I.e. it seems like a semiproductive construction where "a" is special but not part of a frozen expression, so UD won't have the capacity to capture it and det(days, a) is probably the best we can do.

Note that "a long few days" and "a few long days" parse differently for me, even if they work out to meaning similar things in practice. (cf. a long many days vs. *a many long days)

Another few weeks, the next few weeks, etc.—I think here "few" is just a regular quantifier adjective. (cf. another several weeks, the next several weeks)

nschneid commented 3 years ago

I'm not sure I understood you- are you saying "a few" by itself should also be flat? If so, what about "the few" in "the few I know"?

"The few I know" fits the definite-article + adjective-coercion-to-nominal construction, so I think that would be det just like "the rich".

Yes, flat + ExtPos=ADJ for "a few" whether it is by itself or not. The test is: can it be grammatically replaced by "many"? If so, analyze it as a syntactic equivalent that just happens to have two words without internal structure.

amir-zeldes commented 3 years ago

I think "a long few days" and "a few days" is the same "few". I would consider all of the above constructions to contain the article with its normal label of det, and it think it would be the most intuitive and reliable for annotators as well.

sylvainkahane commented 3 years ago

If "few" works as an adjective in "my very few books", it appears that "a few" works as a determiner in "a few books" because it blocks up the position and prevents any other determiner to appear. So I think that "a few" must be det with ExtPos=DET.

About the internal structure of "a few" you showed that some adjectives are possible so it is possible not to treat "a few" as a fixed expression and to analyse "a" as a det of "few". I also seems that "few" in "a few" acts as a NOUN in this case. Maybe I am influenced by French where we have the ADV/NOUN "peu" which acts now as an adverb ("il lit très peu" 'he reads very little') but was a noun and appears in many fixed expressions with a determiner (il lit un peu 'he read a few'; le peu que j'en sais 'the little I know', etc.).

I agree with @nschneid that "a long few days" might be a different construction, where "few days" is a unit and not "a long few".

nschneid commented 3 years ago

If "few" works as an adjective in "my very few books", it appears that "a few" works as a determiner in "a few books" because it blocks up the position and prevents any other determiner to appear. So I think that "a few" must be det with ExtPos=DET.

Oh good point. So "I bought a few books" has ExtPos=DET. What about "I bought a few"—ExtPos=ADJ or DET? Note that "many" is always tagged as ADJ, whereas "some" is always tagged as DET. Both can appear without a head noun: I bought many/some. I guess treating "a few" as DET across the board, like "some", would make sense (the many books / the a few books / the some books).

Re: "few" acting as a noun...I agree it can do this in some contexts (e.g. with a definite article), but in "a few" it is hard for me to say it is more noun-like than "some" or "many" apart from "a" looking superficially like an article. And a DET MWE with an internal DET feels a bit awkward to me. Seems like the more neutral solution is to say, this is a weird expression that doesn't generalize to words other than "a" + "few/little" + occasionally an internal modifier, and flat is a way to achieve that.

And from a native speaker intuition perspective, I am having trouble conceptualizing "few" and "little" as nouns (as opposed to "lot" or "bit"), except when coerced with a definite article.

amir-zeldes commented 3 years ago

Let me recap my arguments against flat/fixed:

I agree that there are interesting and subtle differences between the various constructions, but I think an average user would expect these to look similar:

I know a few a few that I know the few that I know few that I know (would say that) our last few remaining problems

Across these related constructions, "few" can be combined with most normal determiner options, suggesting that the "super-schema" of what they have in common basically calls for one of the standard English determiners (incl. zero). I think that making "few" be a child of the determiner in only some of these is asking for annotator disagreements and parsing errors, and I don't see any real benefit.

If the goal is to have "a few" as a phrase, then I think it should follow the normal determiner as child + det option, which would maintain the status quo for independent "a few" and keep parallelism to the other constructions with "few".

nschneid commented 3 years ago

Let me lay out my argument for the different constructions before getting to whether it is practical or not for annotators.

I think the key syntactic tests for MWE status are:

I would advocate the MWE analysis (with flat) only for cases that pass the first test and fail the second, showing that the expression is particular to "a" + "few"/"little" and that together they function like "some". For example:

  1. I bought a few books.
    • MWE: I bought some books; *I bought a several books
    • obj(bought, books), flat(a/DET, few/ADJ), a+few: ExtPos=DET, det(books, a)
  2. I bought a few.
    • MWE: I bought some; *I bought a several
    • obj(bought, few), flat(a/DET, few/ADJ), a+few: ExtPos=DET
  3. I bought few/many/several books. — not MWE: obj(bought, books), amod(books, few/ADJ)
  4. I bought the few/many/several books (that I wanted). — not MWE: obj(bought, books), amod(books, few/ADJ), det(books, the)
  5. I stayed a long few/many/several days. — not MWE: obj(stayed, days), det(days, a), amod(days, long), amod(days, few)

So while on the surface these are similar:

a few that I know the few that I know

the above tests distinguish them (cases 2 & 4).

Now, we have the question of whether it is practical to have UD annotators remember to use an anomalous deprel of "a" in the first two cases (flat rather than det). I can see the point that it requires a lot of nuance. While I think flat is more of a "proper" MWE treatment, I think it is OK to fudge a bit on the internal structure deprels to make them look more like the typical uses of those deprels, so I could envision the following compromise that does away with flat:

  1. I bought a few books. obj(bought, books), det(few/ADJ, a/DET), few: ExtPos=DET, det(books, few) [whole expression considered a determiner to address @sylvainkahane's point]
  2. I bought a few. obj(bought, few), det(few/ADJ, a/DET), few: ExtPos=DET
  3. I bought few/many/several books. obj(bought, books), amod(books, few/ADJ)
  4. I bought the few/many/several books (that I wanted). — not MWE: obj(bought, books), amod(books, few/ADJ), det(books, the)
  5. I stayed a long few/many/several days. — not MWE: obj(stayed, days), det(days, a), amod(days, long), amod(days, few)

Basically the principle would be, "a" modifies "few" rather than the head noun apart from case 5, with an intervening adjective that doesn't modify "few". We already do this if there is no head noun, so the change would make things more consistent.

sylvainkahane commented 3 years ago

I understand the reluctance of @amir-zeldes to use fixed for "a few" because the internal structure is quite clear and the expression is not completely frozen, seing that some ADJ can modify "few". But according to UD choices, I think that "a few" must be considered as a fixed expression. The main argument is the fact that "few" and "a few" don't have exactly the same distribution: "few" works as an amod, while "a few" works as a det.

(Note that in SUD we decided not to use the relation fixed, and to indicate the internal syntactic structure of idioms and use features PhraseType=Idiom and ExtPos on the head of the idiom and InIdiom=Yes for the the words inside the idiom. It means that we will use the relation det(few, a) and add PhraseType=Idiom and ExtPos=DET on "few".)

@nschneid I don't think flat is a possible solution. If we consider "a few" as a fixed expression, the relation must be fixed, if not, the relation must be det(few, a).

amir-zeldes commented 3 years ago

I agree with @sylvainkahane that there is internal structure, but I think that rules out not only flat, but also fixed, since UD stipulates that fixed goes left to right from the first token. If it is not completely frozen, as you say above, then it is also not fixed. The fact that "few" and "a few" are not distributed the same is true of many items with and without an article, but still, when the article is there I think it is the regular determiner, and the NP "a few" has a certain function then, which can be non-identical to the function of "few".

The about @nschneid 's suggestion: I would feel uncomfortable doing a chain of dets, which is otherwise unattested in English. I understand the appeal of treating it like a complex determiner, but I think here we run up against UD's token-centric conventions, which also mandate that "hours" in "dance two hours" is not advmod, since "hours" in itself is not an adverb. I think since we are dealing with a phrasal modifier of nouns, it should be nmod:npmod, and this also avoids the awkward det chain. It's also not unusual for an nmod subtype to take up a specifier position, just like possessive nmod:poss can mark a genitive possessive or a possessive article (my, your etc.)

Finally, specifically for 5 above, if we want a phrasal "a few" analysis, then I don't think "a" should modify "days" (though currently it would). I think although semantically it is the days that are long, syntactically we have:

det(few, a) amod(few, long) nmod:npmod(days,few)

nschneid commented 3 years ago

The about @nschneid 's suggestion: I would feel uncomfortable doing a chain of dets, which is otherwise unattested in English. I understand the appeal of treating it like a complex determiner, but I think here we run up against UD's token-centric conventions, which also mandate that "hours" in "dance two hours" is not advmod, since "hours" in itself is not an adverb. I think since we are dealing with a phrasal modifier of nouns, it should be nmod:npmod, and this also avoids the awkward det chain. It's also not unusual for an nmod subtype to take up a specifier position, just like possessive nmod:poss can mark a genitive possessive or a possessive article (my, your etc.)

I see your point but nmod:npmod indicates we are analyzing "a few" as a nominal expression, which seems wrong when it's in specifier position ("a few books"). To me it is bare "a few" ("I bought a few") that is coerced from a compound determiner into a nominal, not that it is always a nominal.

Finally, specifically for 5 above, if we want a phrasal "a few" analysis, then I don't think "a" should modify "days" (though currently it would). I think although semantically it is the days that are long, syntactically we have:

det(few, a) amod(few, long) nmod:npmod(days,few)

But this makes "a long few" into a constituent, which I don't see any evidence for.

amir-zeldes commented 3 years ago

I think it is a nominal expression, that's why it can take "a" in the first place, no? Compound determiners generally still have an underlying part of speech, and I think it's the same "a few" as always (so either a noun, or an adjective nominalized into a noun, but either way a nominal).

But this makes "a long few" into a constituent, which I don't see any evidence for.

Some examples with an adjective but without a noun:

I feel really put off by the look of det <- det, especially with it being unparalleled elsewhere in English dependency analyses, and probably being typologically very rare across UD. In fact, I'm starting to think that if it is so unclear then maybe the best solution is actually the current one, which is possibly a bit naive but at least easy to explain and consistent across constructions.

nschneid commented 3 years ago

"a select/specialized/fair/rare few"—these are all internal modification of "a few" (which is why it's arguably not fixed). For me the modifiers allowed here are limited to explaining how few or particularized the amount is, not arbitrary properties of the thing being quantified.

The "bad few" example is ungrammatical for me. Perhaps a different dialect.

amir-zeldes commented 3 years ago

the modifiers allowed here are limited to explaining how few or particularized the amount is, not arbitrary properties

I think those are semantic distinctions and not syntactic ones. Once the NN is dropped, the only thing left for modification is "few" either way. I'm also not really sure that's true of some of these, for example being a "select" few is a property of the thing counted, not of the count itself. But if you want clearer examples of semantically 'non-quantitative' modifiers, like "bad", those aren't too hard to find:

nschneid commented 3 years ago

Interesting. This is a sense of "few" that at least some dictionaries categorize as a noun. I think it prototypically means a minority of people. And I notice the adjective is often evaluative; maybe this is part of the construction's core meaning (singling a small group of people out as aberrant).

Could it refer to a small quantity in general? I'm not sure:

?Though I enjoy eating berries, occasionally the experience is marred by a sour few.

??You can have all the sweet berries; I'll just have a sour few. (Better: a few sour ones)

??While the treatment used to require upwards of 12 visits, now it can be completed in only a short few.

Anyway I'm becoming convinced that there's a continuum of grammaticalization at work here, with several interrelated and nuanced constructions, which is why categorization is tricky.

amir-zeldes commented 3 years ago

Yes, the more I look into examples the more I realize how productive this construction is, and it's hard to find clear boundaries between classes. For me this all speaks for just leaving it alone, and definitely staying away from fixed solutions.

BTW there are plenty of examples of non-humans with adjectives, for example:

nschneid commented 3 years ago

OK but all of these seem to me like quantity-modifying adjectives rather than property adjectives. "a good few (budgies)" doesn't mean a few budgies that are good. One could have a good few bad budgies.

I'm polling people on social media to see how they feel about the grammaticality of some of these. And getting mixed responses. Clearly it's complicated!

amir-zeldes commented 3 years ago

Hm, I see, well if you want 1. the few construction, 2. for the noun to be missing, 3. for there to be an adjective modifying few, 4. for that adjective to be semantically non-quantitative and 5. for the entire phrase to stand for a non-human... that's just going to be very rare by virtue of the chain rule - but I'm not sure if that's important for the syntax.

If it is, then wading through corpus examples I can offer these cases, some are more borderline than others:

But TBH your example "a good few bad budgies" suggests to me that adjectives belonging syntactically to the 'lexical' noun should appear between it and "few", so if "long" were a syntactic modifier of "days", we should get "a few long days", not "a long few days". Semantics aside, I think the difference in position probably corresponds to a syntactic subordination difference.

That all being said, I'm fine with keeping everything always being a dependent of the lexical noun if available, and of "few" if not, which is the status quo.

nschneid commented 3 years ago

Yeah I'm hearing various levels of discomfort about some of these...a lot of "I definitely wouldn't say that but maybe someone would". So there's probably no fixed set of boundaries that all speakers would agree on, just some more prototypical and less prototypical cases.

Even if I had a clear linguistic analysis of the full range of constructions, it probably wouldn't be obvious how to map them to UD in practice. So let's just stick with the status quo for now.