Closed nschneid closed 1 month ago
This is due to pos=NNP and the like, see amir-zeldes/gum#186 for discussion
Also some ADVs in GUM and one in EWT.
Looked into this regarding PUD. There are 4 cases.
Weirdest is
# newdoc id = w04006
# sent_id = w04006023
# text = Historian David Crouch suggests that Stephen abandoned from the challenge around this time to focus on other issues.
1 Historian historian ADJ NN Number=Sing 2 amod 2:amod _
so in this one, the job has become an ADJ
! I don't like that. I found multiple examples of author
in a similar role in PTB:
(NP-SBJ (NN baseball) (NN author) (NNP Lawrence) (NNP Ritter) )
(NP (NN Author) (NNP Dashiell) (NNP Hammett) )
(NP (NN author) (NNP William) (NNP Buckley) )))
There is however a similar example in EWT with ADJ
:
# sent_id = newsgroup-groups.google.com_humanities.lit.authors.shakespeare_0018a7697318f71f_ENG_20031006_163200-0023
# text = Historian John Stow dies: April 6, 1605 Sat/Wed.
1 Historian historian ADJ JJ Degree=Pos 2 amod 2:amod _
then again, also from EWT
# sent_id = weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0169
# text = A report by the Center for Disease Control of interviews with AMI employees (as well as detailed interviews by author Leonard Cole) supports the
21 author author NOUN NN Number=Sing 22 compound 22:compound _
22 Leonard Leonard PROPN NNP Number=Sing 19 nmod 19:nmod:by _
23 Cole Cole PROPN NNP Number=Sing 22 flat 22:flat SpaceAfter=No
# sent_id = reviews-127252-0002
# newpar id = reviews-127252-p0002
# text = I've had writer friends describe horror stories with their printers.
4 writer writer NOUN NN Number=Sing 5 compound 5:compound _
5 friends friend NOUN NNS Number=Plur 3 obj 3:obj|6:nsubj:xsubj _
so my interpretation is that someone's profession as a title should be a NOUN
, not an ADJ
Others are
# newdoc id = n01031
# sent_id = n01031005
# text = Researchers have been investigating potential for male hormonal contraceptives for around 20 years.
7 male male ADJ NN Number=Sing 9 amod 9:amod _
this follows male cats
from EWT which is tagged ADJ
with Degree=Pos
# sent_id = n01050014
# text = It's possible to have normal hemoglobin levels, but to have low iron stores overall, says Canadian Blood Services (CBS).
19 Canadian Canadian ADJ NNP Number=Sing 21 amod 21:amod _
20 Blood Blood PROPN NNP Number=Sing 21 compound 21:compound _
21 Services Services PROPN NNPS Number=Plur 18 nsubj 18:nsubj _
similar to Canadian Immigration Lawyers
, also Degree=Pos
and then
# sent_id = w01045003
# text = After the discovery of America by Christopher Columbus in 1492, the Spanish term Antillas applied to the lands
13 Spanish Spanish ADJ NNP Number=Sing 14 amod 14:amod _
14 term term NOUN NN Number=Sing 16 nsubj 16:nsubj _
15 Antillas Antillas PROPN NNP Number=Sing 14 appos 14:appos _
This one I'm a little unclear on. Is this not a case of Spanish being used as a noun? I think this should also be tagged NOUN
as opposed to ADJ
. Compare to this other example from PUD
# sent_id = w05006058
# text = On the other hand, external history contains references to the history of Spanish speakers
14 Spanish Spanish PROPN NNP Number=Sing 15 compound 15:compound _
15 speakers speaker NOUN NNS Number=Plur 12 nmod 12:nmod:of SpaceAfter=No
but maybe Spanish term
becomes an ADJ
usage?
Incidentally, what is the genesis of the tags in PUD? Is it kosher to change the XPOS when they are wrong? (male_NN contraceptives)
"historian" as ADJ is an error. I suspect a tagger assigned it based on -ian ending, which can appear on adjectives.
"Spanish" seems correct as PROPN when naming the language and as ADJ when used as a property ('pertaining to Spain'). Geopolitical, ethnic, and religious identifies often give rise to proper adjectives.
I would go with:
Spain: PROPN Spaniard: PROPN Spanish: PROPN if denoting the language, ADJ otherwise
Canada: PROPN Canadian: PROPN if denoting a person from Canada, ADJ otherwise
French: PROPN for the language and "the French", ADJ otherwise Frenchman, Francophone: PROPN
(There are frameworks where an NP can be derived from an adjective head in the syntax, so even "the French" would be an adjective, but that seems like a stretch for UD.)
On Tue, Jun 25, 2024, 12:04 PM John Bauer @.***> wrote:
Looked into this regarding PUD. There are 4 cases.
Weirdest is
newdoc id = w04006
sent_id = w04006023
text = Historian David Crouch suggests that Stephen abandoned from the challenge around this time to focus on other issues.
1 Historian historian ADJ NN Number=Sing 2 amod 2:amod _
so in this one, the job has become an ADJ! I don't like that. I found multiple examples of author in a similar role in PTB:
(NP-SBJ (NN baseball) (NN author) (NNP Lawrence) (NNP Ritter) ) (NP (NN Author) (NNP Dashiell) (NNP Hammett) ) (NP (NN author) (NNP William) (NNP Buckley) )))
There is however a similar example in EWT with ADJ:
sent_id = newsgroup-groups.google.com_humanities.lit.authors.shakespeare_0018a7697318f71f_ENG_20031006_163200-0023
text = Historian John Stow dies: April 6, 1605 Sat/Wed.
1 Historian historian ADJ JJ Degree=Pos 2 amod 2:amod _
then again, also from EWT
sent_id = weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0169
text = A report by the Center for Disease Control of interviews with AMI employees (as well as detailed interviews by author Leonard Cole) supports the
21 author author NOUN NN Number=Sing 22 compound 22:compound 22 Leonard Leonard PROPN NNP Number=Sing 19 nmod 19:nmod:by 23 Cole Cole PROPN NNP Number=Sing 22 flat 22:flat SpaceAfter=No
sent_id = reviews-127252-0002
newpar id = reviews-127252-p0002
text = I've had writer friends describe horror stories with their printers.
4 writer writer NOUN NN Number=Sing 5 compound 5:compound 5 friends friend NOUN NNS Number=Plur 3 obj 3:obj|6:nsubj:xsubj
so my interpretation is that someone's profession as a title should be a NOUN, not an ADJ
Others are
newdoc id = n01031
sent_id = n01031005
text = Researchers have been investigating potential for male hormonal contraceptives for around 20 years.
7 male male ADJ NN Number=Sing 9 amod 9:amod _
this follows male cats from EWT which is tagged ADJ with Degree=Pos
sent_id = n01050014
text = It's possible to have normal hemoglobin levels, but to have low iron stores overall, says Canadian Blood Services (CBS).
19 Canadian Canadian ADJ NNP Number=Sing 21 amod 21:amod 20 Blood Blood PROPN NNP Number=Sing 21 compound 21:compound 21 Services Services PROPN NNPS Number=Plur 18 nsubj 18:nsubj _
similar to Canadian Immigration Lawyers, also Degree=Pos
and then
sent_id = w01045003
text = After the discovery of America by Christopher Columbus in 1492, the Spanish term Antillas applied to the lands
13 Spanish Spanish ADJ NNP Number=Sing 14 amod 14:amod 14 term term NOUN NN Number=Sing 16 nsubj 16:nsubj 15 Antillas Antillas PROPN NNP Number=Sing 14 appos 14:appos _
This one I'm a little unclear on. Is this not a case of Spanish being used as a noun? I think this should also be tagged NOUN as opposed to ADJ. Compare to this other example from PUD
sent_id = w05006058
text = On the other hand, external history contains references to the history of Spanish speakers
14 Spanish Spanish PROPN NNP Number=Sing 15 compound 15:compound _ 15 speakers speaker NOUN NNS Number=Plur 12 nmod 12:nmod:of SpaceAfter=No
but maybe Spanish term becomes an ADJ usage?
— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/525#issuecomment-2189357867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQRL5DPKBP5EB3CRTNJQ3ZJGIHVAVCNFSM6AAAAABHH3HH5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGM2TOOBWG4 . You are receiving this because you modified the open/close state.Message ID: @.*** com>
"Are you male or female?" (no article) suggests "male" and "female" can be adjectives.
No idea about PUD but in EWT we do fix xpos errors.
On Tue, Jun 25, 2024, 12:21 PM Nathan Schneider < @.***> wrote:
"historian" as ADJ is an error. I suspect a tagger assigned it based on -ian ending, which can appear on adjectives.
"Spanish" seems correct as PROPN when naming the language and as ADJ when used as a property ('pertaining to Spain'). Geopolitical, ethnic, and religious identifies often give rise to proper adjectives.
I would go with:
Spain: PROPN Spaniard: PROPN Spanish: PROPN if denoting the language, ADJ otherwise
Canada: PROPN Canadian: PROPN if denoting a person from Canada, ADJ otherwise
French: PROPN for the language and "the French", ADJ otherwise Frenchman, Francophone: PROPN
(There are frameworks where an NP can be derived from an adjective head in the syntax, so even "the French" would be an adjective, but that seems like a stretch for UD.)
On Tue, Jun 25, 2024, 12:04 PM John Bauer @.***> wrote:
Looked into this regarding PUD. There are 4 cases.
Weirdest is
newdoc id = w04006
sent_id = w04006023
text = Historian David Crouch suggests that Stephen abandoned from the challenge around this time to focus on other issues.
1 Historian historian ADJ NN Number=Sing 2 amod 2:amod _
so in this one, the job has become an ADJ! I don't like that. I found multiple examples of author in a similar role in PTB:
(NP-SBJ (NN baseball) (NN author) (NNP Lawrence) (NNP Ritter) ) (NP (NN Author) (NNP Dashiell) (NNP Hammett) ) (NP (NN author) (NNP William) (NNP Buckley) )))
There is however a similar example in EWT with ADJ:
sent_id = newsgroup-groups.google.com_humanities.lit.authors.shakespeare_0018a7697318f71f_ENG_20031006_163200-0023
text = Historian John Stow dies: April 6, 1605 Sat/Wed.
1 Historian historian ADJ JJ Degree=Pos 2 amod 2:amod _
then again, also from EWT
sent_id = weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0169
text = A report by the Center for Disease Control of interviews with AMI employees (as well as detailed interviews by author Leonard Cole) supports the
21 author author NOUN NN Number=Sing 22 compound 22:compound 22 Leonard Leonard PROPN NNP Number=Sing 19 nmod 19:nmod:by 23 Cole Cole PROPN NNP Number=Sing 22 flat 22:flat SpaceAfter=No
sent_id = reviews-127252-0002
newpar id = reviews-127252-p0002
text = I've had writer friends describe horror stories with their printers.
4 writer writer NOUN NN Number=Sing 5 compound 5:compound 5 friends friend NOUN NNS Number=Plur 3 obj 3:obj|6:nsubj:xsubj
so my interpretation is that someone's profession as a title should be a NOUN, not an ADJ
Others are
newdoc id = n01031
sent_id = n01031005
text = Researchers have been investigating potential for male hormonal contraceptives for around 20 years.
7 male male ADJ NN Number=Sing 9 amod 9:amod _
this follows male cats from EWT which is tagged ADJ with Degree=Pos
sent_id = n01050014
text = It's possible to have normal hemoglobin levels, but to have low iron stores overall, says Canadian Blood Services (CBS).
19 Canadian Canadian ADJ NNP Number=Sing 21 amod 21:amod 20 Blood Blood PROPN NNP Number=Sing 21 compound 21:compound 21 Services Services PROPN NNPS Number=Plur 18 nsubj 18:nsubj _
similar to Canadian Immigration Lawyers, also Degree=Pos
and then
sent_id = w01045003
text = After the discovery of America by Christopher Columbus in 1492, the Spanish term Antillas applied to the lands
13 Spanish Spanish ADJ NNP Number=Sing 14 amod 14:amod 14 term term NOUN NN Number=Sing 16 nsubj 16:nsubj 15 Antillas Antillas PROPN NNP Number=Sing 14 appos 14:appos _
This one I'm a little unclear on. Is this not a case of Spanish being used as a noun? I think this should also be tagged NOUN as opposed to ADJ. Compare to this other example from PUD
sent_id = w05006058
text = On the other hand, external history contains references to the history of Spanish speakers
14 Spanish Spanish PROPN NNP Number=Sing 15 compound 15:compound _ 15 speakers speaker NOUN NNS Number=Plur 12 nmod 12:nmod:of SpaceAfter=No
but maybe Spanish term becomes an ADJ usage?
— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/525#issuecomment-2189357867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQRL5DPKBP5EB3CRTNJQ3ZJGIHVAVCNFSM6AAAAABHH3HH5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGM2TOOBWG4 . You are receiving this because you modified the open/close state.Message ID: @.*** com>
SGTM. I'll merge that change then, since that PR and your recommendations match. I'll also submit a PR for EWT's "historian" token
Hmm, suddenly I'm less convinced about Spanish term Antillas
in PUD upon trying to rearrange the dependencies to match retagging. It really feels like term
or maybe Antillas
wants to be the head. Here's the current parsing
12 the the DET DT Definite=Def|PronType=Art 14 det 14:det _
13 Spanish Spanish ADJ NNP Number=Sing 14 amod 14:amod _
14 term term NOUN NN Number=Sing 16 nsubj 16:nsubj _
15 Antillas Antillas PROPN NNP Number=Sing 14 appos 14:appos _
16 applied apply VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root _
17 to to ADP IN _ 19 case 19:case _
18 the the DET DT Definite=Def|PronType=Art 19 det 19:det _
19 lands land NOUN NNS Number=Plur 16 obl 16:obl:to SpaceAfter=No
So term
is the head. I suppose we could make the dependency an nmod
from Spanish
to term
and keep term
the head of that phrase
appos
is correct: "the Spanish term" and "Antillas" are two full noun phrases that have the same referent and can be swapped.
Within "the Spanish term", "term" is correct as the head. If "Spanish" is tagged as PROPN then it should attach as compound
. I don't know if GUM or EWT has a precedent for a language name as attributive modifier ("the French language", "a German word" etc.). Usually language names are nominal heads.
I didn't do an exhaustive search over languages, but I didn't find any other examples. I can make it a compound
edit; but I suppose that means we need to be happy with Spanish_PROPN
. Does that sound right?
This example from the PTB guidelines suggests that languages are always PROPN even if attributive:
Browsing the comparison of English treebanks, I noticed something odd: a handful of ADJs in EWT, and lots in GUM and GENTLE, have
Number=Sing
. LinES, PUD, ParTUT as well. I suppose this arose from a bug at some point in a pipeline.