Open AngledLuffa opened 2 years ago
"People" has an ethnic or national group sense as well as a 'persons' sense. I think "the original people of the islands" is ambiguous—it could refer to the individuals (persons) who originally inhabited the island, in which case it is plural, or it could be referring to a group, in which case it is singular. Does verb agreement resolve this?
Ah, good point, in this case it is clearly a plural noun based on the verb in the sentence.
One issue that arises in EWT is that "people" always has the lemma "people", even in the case of multiple persons.
This was always an issue with WordNet-based lemmatizers that didn't have morphological subtypes of nouns. But we have number information so I don't see why we couldn't lemmatize people/NNS to person.
So, update EWT (and CoreNLP)?
alright, I submitted another PR for EWT which changes most of the people to person
So CGEL (p. 345) says there are two senses of "people", one of which is plural-only and one of which is singular, pluralized as "peoples":
Semantically, I feel like "the American people" is closer to the second sense than to a plural of "person", because it is talking about Americans as a national body, but I suppose plural agreement ("the American people were...") indicates it should be interpreted as the first. But note that CGEL is not claiming that the first sense of "people" is a plural of "person": they say "person being an ordinary noun with both singular and plural forms. Persons is then in competition with people1 [which is more common]".
So I guess the CGEL point of view is that "people" should never be lemmatized to "person". But in practice, "people" is most often used in place of "persons". Will users of our corpus thus expect "person" as the lemma? And if so, what is the right criterion for cases like "the American people"?
I think we have a good argument from https://twitter.com/complingy/status/1550730255433928704 regarding whether "the American people" is more like "those American people" or "this American people": "the American and German people" would most likely not refer to "a people" (an established social unit) but rather to an amalgamation of Americans and Germans. So this is the plural-only "people", not the singular, and by analogy "the American people" should not be considered singular "people", even though the members of a nationality are being referred to generically and in a way that makes it hard to substitute a transparent plural like "citizens". (Maybe this is a formula/construction: "the DemonymAdj people" used in political oratory.)
How does this argument affect the "people" PR I filed? For example...
# sent_id = weblog-blogspot.com_dakbangla_20041028153019_ENG_20041028_153019-0009
14 the the DET DT Definite=Def|PronType=Art 15 det 15:det _
15 people person NOUN NNS Number=Plur 10 nmod 10:nmod:for _
16 of of ADP IN _ 17 case 17:case _
17 Pakistan Pakistan PROPN NNP Number=Sing 15 nmod 15:nmod:of SpaceAfter=No
# sent_id = weblog-blogspot.com_rigorousintuition_20050518101500_ENG_20050518_101500-0027
9 and and CCONJ CC _ 14 cc 14:cc _
10 the the DET DT Definite=Def|PronType=Art 12 det 12:det _
11 Venezuelan Venezuelan ADJ JJ Degree=Pos 12 amod 12:amod _
12 people person NOUN NNS Number=Plur 14 nsubj 14:nsubj _
13 will will AUX MD VerbForm=Fin 14 aux 14:aux _
14 ensure ensure VERB VB VerbForm=Inf 7 conj 7:conj:and _
# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0067
1 " " PUNCT `` _ 26 punct 26:punct SpaceAfter=No
2 The the DET DT Definite=Def|PronType=Art 4 det 4:det _
3 black black ADJ JJ Degree=Pos 4 amod 4:amod _
4 race race NOUN NN Number=Sing 7 nsubj 7:nsubj _
5 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 7 cop 7:cop _
6 the the DET DT Definite=Def|PronType=Art 7 det 7:det _
7 people person NOUN NNS Number=Plur 26 ccomp 15:obl|26:ccomp _
8 through through ADP IN _ 9 case 9:case _
...
That last one, btw, yikes... sometimes people wonder how deep learning models wind up racist.
A two for one:
# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0294
32 it it PRON PRP Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 34 expl 34:expl _
33 does do AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 34 aux 34:aux _
34 bother bother VERB VB VerbForm=Inf 0 root 0:root _
35 me I PRON PRP Case=Acc|Number=Sing|Person=1|PronType=Prs 34 obj 34:obj _
36 when when SCONJ WRB PronType=Int 38 mark 38:mark _
37 people person NOUN NNS Number=Plur 38 nsubj 38:nsubj _
38 single single VERB VBP Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 34 csubj 34:csubj _
39 out out ADP RP _ 38 compound:prt 38:compound:prt _
40 a a DET DT Definite=Ind|PronType=Art 42 det 42:det _
41 specific specific ADJ JJ Degree=Pos 42 amod 42:amod _
42 group group NOUN NN Number=Sing 38 obj 38:obj _
43 of of ADP IN _ 44 case 44:case _
44 people person NOUN NNS Number=Plur 42 nmod 42:nmod:of _
45 to to PART TO _ 46 mark 46:mark _
46 pin pin VERB VB VerbForm=Inf 42 acl 42:acl:to _
47 the the DET DT Definite=Def|PronType=Art 48 det 48:det _
48 blame blame NOUN NN Number=Sing 46 obj 46:obj _
49 on on ADP IN _ 46 obl 46:obl SpaceAfter=No
# sent_id = weblog-blogspot.com_alaindewitt_20060924104100_ENG_20060924_104100-0020
12 butcher butcher VERB VB VerbForm=Inf 5 conj 5:conj:and _
13 his he PRON PRP$ Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs 15 nmod:poss 15:nmod:poss _
14 own own ADJ JJ Degree=Pos 15 amod 15:amod _
15 people person NOUN NNS Number=Plur 12 obj 12:obj _
Maybe each of these examples stay with the lemma "person"?
In the following context,
peoples
becomespeople
:This is pretty similar to "people of the ..."
In the second case, it's a single group made up of multiple persons, and in the first case, it's multiple groups made of multiple persons. I think either the first case should have a lemma of "person" as well, or the second case should have a lemma of "people". It doesn't quite feel consistent otherwise.