amir-zeldes / gum

Repository for the Georgetown University Multilayer Corpus (GUM)
https://gucorpling.org/gum/
Other
89 stars 50 forks source link

Lemmas of pluralized years #169

Closed nschneid closed 11 months ago

nschneid commented 11 months ago

https://universal.grew.fr/?custom=653e77a1ac7ea - I take it the lemma of "1970s" (NOUN) should be "1970", since the "s" suffix is reflected in Number=Plur? That's what EWT does.

nschneid commented 11 months ago

Arguably also these: https://universal.grew.fr/?custom=653e7834a2739

amir-zeldes commented 11 months ago

I have always considered things like 1970s to be pluralia tantrum, with themselves as the lemma. Of course we could also have different instances of the year 1970 in different timelines, and then 1970s could be the regular plural of 1970.

But the normal use of the 1970s is not multiple instances of that year - it's the set 1970, 1971, 1972...

As such I think plural is right (also due to agreement), but it's not a form of the lemma 1970 and is distinct from its plural, which can occur in the parallel universes scenario or other metaphorical ones.

nschneid commented 11 months ago

Huh...I always thought of "the 1970s" as a set of 10 distinct "1970-decade-years", so there's a bit of metonymy but it does refer to a set of multiple distinct elements—unlike "scissors" or "pants". I suppose you might alternatively construe "the 1970s" as a continuous range of time, such that the part-whole relationship would be like scissors. Since it's a productive pattern I would lean toward the regular plural interpretation rather than having to posit a bunch of distinct lemmas, though.

amir-zeldes commented 11 months ago

It's definitely productive, but I don't think that needs to be an argument that it has the same lemma. 1970 has a possible plural 1970s (parallel universes), but "the 70s" is its own thing IMO. It's pretty idiosyncratic and unpredictable too, since I think "the 1900s" does not mean all years from 1900 -- 1999, and other time periods don't do this ("the 1000s" is not all centuries from 1000-2000). I think it the "70s" etc. deserve a separate lemma, and as you noted above there are other pluralia tantum that take the word form as the lemma as well.

nschneid commented 11 months ago

I think "the 1900s" does not mean all years from 1900 -- 1999

In most contexts I would definitely interpret it as the entire century! @aryamanarora pointed out that Wiktionary has entries for pluralized years, and they give two senses: https://en.wiktionary.org/wiki/1900s

When the turn of the millennium happened people were discussing what to call the 2000-2009 decade, e.g. "the aughts", I assume precisely because "the 2000s" would not be specific enough.

nschneid commented 11 months ago

(Side note: This is making me think there's a paper to be written about the strategies different languages use to name decades, centuries, and millennia. Orthographic as well as grammatical—e.g. in France I saw Roman numerals with ordinal suffixes for centuries. Joakim and I were chatting about this in a museum in Istanbul and he pointed out some differences between English and Swedish.)

nschneid commented 11 months ago

After some discussion on the NERT Slack, it seems that a similar construction applies to multiples of 10 that are not years: "The temperature will be in the mid-80s", "I'm in my 20s". I suspect this is tricky because it's a sort of inflectional-derivational hybrid: the "-s" morpheme usually just indicates plurality, so "20s" literally would just be several instances of "20", but in this extended meaning the base is interpreted as (or coerced to) the collection of 10 values. It is grammatically plural (inflectional), and the meaning of 10-numbers-counting-from-the-multiple-of-ten is clearly motivated by plurality, but one has to learn this pattern beyond learning the regular plural.

Apparently there was a debate within Wikitionary and to avoid a proliferation of pluralized number entries, they instituted an arbitrary cutoff: https://en.wiktionary.org/w/index.php?title=Wiktionary:Requests_for_deletion&oldid=47517182#Decades

Anyway, the concept of lemmas is always a bit fuzzy. Since there is a clear semantic divergence from the usual plural I am willing to go with the GUM policy and include the "s" in the lemma. UniversalDependencies/UD_English-EWT#335 was about removing apostrophes from the lemmas of these plurals but at the time I may not have noticed that EWT already diverged from GUM in lacking the "s" in the lemma. (This goes back many years, e.g. this token.)

nschneid commented 11 months ago

It looks like in GUM, there are a few tokens of spelled-out words where the plural ending was inappropriately removed from the lemma: "twenties", "fifties", and "sixties" https://universal.grew.fr/?custom=654081b1e1f0d

aryamanarora commented 11 months ago

I have always considered things like 1970s to be pluralia tantrum, with themselves as the lemma. Of course we could also have different instances of the year 1970 in different timelines, and then 1970s could be the regular plural of 1970.

"pluralia tantrum" is an apt typo 😆

nschneid commented 11 months ago

Linguist term of art for a group of screaming children

amir-zeldes commented 11 months ago

Thanks for catching those, will fix.