UniversalDependencies / UD_English-GUM

Other
30 stars 4 forks source link

Lemma: peoples vs people #53

Open AngledLuffa opened 2 years ago

AngledLuffa commented 2 years ago

In the following context, peoples becomes people:

# sent_id = GUM_speech_albania-2
6       the     the     DET     DT      Definite=Def|PronType=Art       7       det     7:det   Entity=(6-person-new-cf1-2-coref
7       peoples people  NOUN    NNS     Number=Plur     5       obj     5:obj|12:nsubj:xsubj|14:nsubj:xsubj     _
8       of      of      ADP     IN      _       10      case    10:case _
9       the     the     DET     DT      Definite=Def|PronType=Art       10      det     10:det  Entity=(7-place-new-cf7-2-coref
10      world   world   NOUN    NN      Number=Sing     7       nmod    7:nmod:of       Entity=7)6)

This is pretty similar to "people of the ..."

# sent_id = GUM_voyage_chatham-11
1       The     the     DET     DT      Definite=Def|PronType=Art       3       det     3:det   Discourse=context-background:18->21:1|Entity=(24-person-new-cf1\
-3-coref-Moriori
2       original        original        ADJ     JJ      Degree=Pos      3       amod    3:amod  _
3       people  person  NOUN    NNS     Number=Plur     9       nsubj   9:nsubj _
4       of      of      ADP     IN      _       6       case    6:case  _
5       the     the     DET     DT      Definite=Def|PronType=Art       6       det     6:det   Entity=(1-place-giv:act-cf2*-2-coref-Chatham_Islands
6       islands island  NOUN    NNS     Number=Plur     3       nmod    3:nmod:of       Entity=1)24)

In the second case, it's a single group made up of multiple persons, and in the first case, it's multiple groups made of multiple persons. I think either the first case should have a lemma of "person" as well, or the second case should have a lemma of "people". It doesn't quite feel consistent otherwise.

nschneid commented 2 years ago

"People" has an ethnic or national group sense as well as a 'persons' sense. I think "the original people of the islands" is ambiguous—it could refer to the individuals (persons) who originally inhabited the island, in which case it is plural, or it could be referring to a group, in which case it is singular. Does verb agreement resolve this?

AngledLuffa commented 2 years ago

Ah, good point, in this case it is clearly a plural noun based on the verb in the sentence.

One issue that arises in EWT is that "people" always has the lemma "people", even in the case of multiple persons.

nschneid commented 2 years ago

This was always an issue with WordNet-based lemmatizers that didn't have morphological subtypes of nouns. But we have number information so I don't see why we couldn't lemmatize people/NNS to person.

AngledLuffa commented 2 years ago

So, update EWT (and CoreNLP)?

AngledLuffa commented 2 years ago

alright, I submitted another PR for EWT which changes most of the people to person

nschneid commented 2 years ago

So CGEL (p. 345) says there are two senses of "people", one of which is plural-only and one of which is singular, pluralized as "peoples":

Semantically, I feel like "the American people" is closer to the second sense than to a plural of "person", because it is talking about Americans as a national body, but I suppose plural agreement ("the American people were...") indicates it should be interpreted as the first. But note that CGEL is not claiming that the first sense of "people" is a plural of "person": they say "person being an ordinary noun with both singular and plural forms. Persons is then in competition with people1 [which is more common]".

So I guess the CGEL point of view is that "people" should never be lemmatized to "person". But in practice, "people" is most often used in place of "persons". Will users of our corpus thus expect "person" as the lemma? And if so, what is the right criterion for cases like "the American people"?

nschneid commented 2 years ago

I think we have a good argument from https://twitter.com/complingy/status/1550730255433928704 regarding whether "the American people" is more like "those American people" or "this American people": "the American and German people" would most likely not refer to "a people" (an established social unit) but rather to an amalgamation of Americans and Germans. So this is the plural-only "people", not the singular, and by analogy "the American people" should not be considered singular "people", even though the members of a nationality are being referred to generically and in a way that makes it hard to substitute a transparent plural like "citizens". (Maybe this is a formula/construction: "the DemonymAdj people" used in political oratory.)

AngledLuffa commented 2 years ago

How does this argument affect the "people" PR I filed? For example...

# sent_id = weblog-blogspot.com_dakbangla_20041028153019_ENG_20041028_153019-0009
14      the     the     DET     DT      Definite=Def|PronType=Art       15      det     15:det  _
15      people  person  NOUN    NNS     Number=Plur     10      nmod    10:nmod:for     _
16      of      of      ADP     IN      _       17      case    17:case _
17      Pakistan        Pakistan        PROPN   NNP     Number=Sing     15      nmod    15:nmod:of      SpaceAfter=No
# sent_id = weblog-blogspot.com_rigorousintuition_20050518101500_ENG_20050518_101500-0027
9       and     and     CCONJ   CC      _       14      cc      14:cc   _
10      the     the     DET     DT      Definite=Def|PronType=Art       12      det     12:det  _
11      Venezuelan      Venezuelan      ADJ     JJ      Degree=Pos      12      amod    12:amod _
12      people  person  NOUN    NNS     Number=Plur     14      nsubj   14:nsubj        _
13      will    will    AUX     MD      VerbForm=Fin    14      aux     14:aux  _
14      ensure  ensure  VERB    VB      VerbForm=Inf    7       conj    7:conj:and      _
# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0067
1       "       "       PUNCT   ``      _       26      punct   26:punct        SpaceAfter=No
2       The     the     DET     DT      Definite=Def|PronType=Art       4       det     4:det   _
3       black   black   ADJ     JJ      Degree=Pos      4       amod    4:amod  _
4       race    race    NOUN    NN      Number=Sing     7       nsubj   7:nsubj _
5       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   7       cop     7:cop   _
6       the     the     DET     DT      Definite=Def|PronType=Art       7       det     7:det   _
7       people  person  NOUN    NNS     Number=Plur     26      ccomp   15:obl|26:ccomp _
8       through through ADP     IN      _       9       case    9:case  _
...

That last one, btw, yikes... sometimes people wonder how deep learning models wind up racist.

A two for one:

# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0294
32      it      it      PRON    PRP     Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs  34      expl    34:expl _
33      does    do      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   34      aux     34:aux  _
34      bother  bother  VERB    VB      VerbForm=Inf    0       root    0:root  _
35      me      I       PRON    PRP     Case=Acc|Number=Sing|Person=1|PronType=Prs      34      obj     34:obj  _
36      when    when    SCONJ   WRB     PronType=Int    38      mark    38:mark _
37      people  person  NOUN    NNS     Number=Plur     38      nsubj   38:nsubj        _
38      single  single  VERB    VBP     Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   34      csubj   34:csubj        _
39      out     out     ADP     RP      _       38      compound:prt    38:compound:prt _
40      a       a       DET     DT      Definite=Ind|PronType=Art       42      det     42:det  _
41      specific        specific        ADJ     JJ      Degree=Pos      42      amod    42:amod _
42      group   group   NOUN    NN      Number=Sing     38      obj     38:obj  _
43      of      of      ADP     IN      _       44      case    44:case _
44      people  person  NOUN    NNS     Number=Plur     42      nmod    42:nmod:of      _
45      to      to      PART    TO      _       46      mark    46:mark _
46      pin     pin     VERB    VB      VerbForm=Inf    42      acl     42:acl:to       _
47      the     the     DET     DT      Definite=Def|PronType=Art       48      det     48:det  _
48      blame   blame   NOUN    NN      Number=Sing     46      obj     46:obj  _
49      on      on      ADP     IN      _       46      obl     46:obl  SpaceAfter=No
# sent_id = weblog-blogspot.com_alaindewitt_20060924104100_ENG_20060924_104100-0020
12      butcher butcher VERB    VB      VerbForm=Inf    5       conj    5:conj:and      _
13      his     he      PRON    PRP$    Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs  15      nmod:poss       15:nmod:poss    _
14      own     own     ADJ     JJ      Degree=Pos      15      amod    15:amod _
15      people  person  NOUN    NNS     Number=Plur     12      obj     12:obj  _

Maybe each of these examples stay with the lemma "person"?