UniversalDependencies / UD_English-GUM

Other
30 stars 4 forks source link

Tagging errors for X/PROPN #32

Closed muchang closed 3 years ago

muchang commented 3 years ago

In the following sentences, "Formica fusca" should be a name of a kind of ant (https://en.wikipedia.org/wiki/Formica_fusca), thus, they should be PROPN rather than X.

# sent_id = GUM_interview_ants-3
# s_type = frag
# text = Formica fusca, from file.
1   Formica Formica X   FW  _   2   compound    2:compound  Discourse=background:3->39|Entity=(animal-6
2   fusca   fusca   X   FW  _   0   root    0:root  Entity=animal-6)|SpaceAfter=No
3   ,   ,   PUNCT   ,   _   5   punct   5:punct _
4   from    from    ADP IN  _   5   case    5:case  _
5   file    file    NOUN    NN  Number=Sing 2   nmod    2:nmod:from Entity=(abstract-7)|SpaceAfter=No
6   .   .   PUNCT   .   _   2   punct   2:punct _
# sent_id = GUM_interview_ants-8
# s_type = decl
# text = The team used Formica fusca, an ant species that can form thousand-strong colonies.
1   The the DET DT  Definite=Def|PronType=Art   2   det 2:det   Discourse=background:17->24|Entity=(person-18
2   team    team    NOUN    NN  Number=Sing 3   nsubj   3:nsubj Entity=person-18)
3   used    use VERB    VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   0   root    0:root  _
4   Formica Formica X   FW  _   5   compound    5:compound  Entity=(animal-22
5   fusca   fusca   X   FW  _   3   obj 3:obj   Entity=animal-22)|SpaceAfter=No
6   ,   ,   PUNCT   ,   _   9   punct   9:punct _
7   an  a   DET DT  Definite=Ind|PronType=Art   9   det 9:det   Entity=(animal-6
8   ant ant NOUN    NN  Number=Sing 9   compound    9:compound  _
9   species species NOUN    NN  Number=Sing 5   appos   5:appos|12:nsubj    _
10  that    that    PRON    WDT PronType=Rel    12  nsubj   9:ref   Discourse=elaboration:18->17
11  can can AUX MD  VerbForm=Fin    12  aux 12:aux  _
12  form    form    VERB    VB  VerbForm=Inf    9   acl:relcl   9:acl:relcl _
13  thousand-strong thousand-strong ADJ JJ  Degree=Pos  14  amod    14:amod Entity=(organization-23
14  colonies    colony  NOUN    NNS Number=Plur 12  obj 12:obj  Entity=animal-6)organization-23)|SpaceAfter=No
15  .   .   PUNCT   .   _   3   punct   3:punct _
# sent_id = GUM_interview_ants-14
# s_type = decl
# text = In the wild, Formica fusca can encounter similar chemicals in aphids and dead ants.
1   In  in  ADP IN  _   3   case    3:case  Discourse=background:30->24
2   the the DET DT  Definite=Def|PronType=Art   3   det 3:det   Entity=(place-41
3   wild    wild    NOUN    NN  Number=Sing 8   obl 8:obl:in    Entity=place-41)|SpaceAfter=No
4   ,   ,   PUNCT   ,   _   3   punct   3:punct _
5   Formica Formica X   FW  _   6   compound    6:compound  Entity=(animal-6
6   fusca   fusca   X   FW  _   8   nsubj   8:nsubj Entity=animal-6)
7   can can AUX MD  VerbForm=Fin    8   aux 8:aux   _
8   encounter   encounter   VERB    VB  VerbForm=Inf    0   root    0:root  _
9   similar similar ADJ JJ  Degree=Pos  10  amod    10:amod Bridge=substance-35<substance-42|Entity=(substance-42
10  chemicals   chemical    NOUN    NNS Number=Plur 8   obj 8:obj   Entity=substance-42)
11  in  in  ADP IN  _   12  case    12:case _
12  aphids  aphid   NOUN    NNS Number=Plur 10  nmod    10:nmod:in  Entity=(animal-43)
13  and and CCONJ   CC  _   15  cc  15:cc   _
14  dead    dead    ADJ JJ  Degree=Pos  15  amod    15:amod Entity=(animal-44
15  ants    ant NOUN    NNS Number=Plur 12  conj    10:nmod:in|12:conj:and  Entity=animal-44)|SpaceAfter=No
16  .   .   PUNCT   .   _   8   punct   8:punct _
# sent_id = GUM_interview_ants-35
# s_type = decl
# speaker = NickBos
# text = We collected wild colonies of Formica fusca by searching through old tree-trunks in old logging sites in southern Finland.
1   We  we  PRON    PRP Case=Nom|Number=Plur|Person=1|PronType=Prs  2   nsubj   2:nsubj Discourse=joint:66->46|Entity=(person-18)
2   collected   collect VERB    VBD Mood=Ind|Number=Plur|Person=1|Tense=Past|VerbForm=Fin   0   root    0:root  _
3   wild    wild    ADJ JJ  Degree=Pos  4   amod    4:amod  Entity=(animal-102
4   colonies    colony  NOUN    NNS Number=Plur 2   obj 2:obj   _
5   of  of  ADP IN  _   7   case    7:case  _
6   Formica Formica X   FW  _   7   compound    7:compound  Entity=(animal-6
7   fusca   fusca   X   FW  _   4   nmod    4:nmod:of   Entity=animal-102)animal-6)
8   by  by  SCONJ   IN  _   9   mark    9:mark  Discourse=means:67->66
9   searching   search  VERB    VBG VerbForm=Ger    2   advcl   2:advcl:by  _
10  through through ADP IN  _   12  case    12:case _
11  old old ADJ JJ  Degree=Pos  12  amod    12:amod Entity=(plant-103
12  tree-trunks tree-trunk  NOUN    NNS Number=Plur 9   obl 9:obl:through   _
13  in  in  ADP IN  _   16  case    16:case _
14  old old ADJ JJ  Degree=Pos  16  amod    16:amod Entity=(place-104
15  logging logging NOUN    NN  Number=Sing 16  compound    16:compound _
16  sites   site    NOUN    NNS Number=Plur 12  nmod    12:nmod:in  _
17  in  in  ADP IN  _   19  case    19:case _
18  southern    southern    ADJ JJ  Degree=Pos  19  amod    19:amod Entity=(place-105-Southern_Finland_Province
19  Finland Finland PROPN   NNP Number=Sing 16  nmod    16:nmod:in  Entity=plant-103)place-104)place-105-Southern_Finland_Province)|SpaceAfter=No
20  .   .   PUNCT   .   _   2   punct   2:punct _

Commit: d38df82

amir-zeldes commented 3 years ago

mm.. I don't think so - it's debatable whether the FW tag is correct here, it's mainly motivated by the use of Latin, and probably also by the fact that the Latin appears in italics in the original data (so the annotator probably felt this was being highlighted as foreign). But even if it weren't FW, then I think it would be NOUN, since names for generic species of animals are still common. For example "hawk" or "dog" would just be NN, so I think 'canis' should also be a noun, no?

Additionally in defense of FW here, note that an English proper name would have capitalization on both words (the mixing is also confusing from an English perspective), and that there is a certain amount of Latin syntax here (formica fusca means "dark ant" in Latin, and we have feminine agreement between the noun and adjective)

muchang commented 3 years ago

I agree with you. Especially, as "Formica fusca" are Latin words, it makes sense to tag them as X/FW.