UniversalDependencies / UD_English-GUM

Other
32 stars 4 forks source link

Tagging errors for SYM/PUNCT #34

Closed muchang closed 3 years ago

muchang commented 3 years ago

In GUM, most of "/"s are tagged as SYM while the following three are tagged as PUNCT.

# sent_id = GUM_whow_languages-35
# s_type = imp
# text = Grab a book / novel and translate it to your own language.
1   Grab    grab    VERB    VB  Mood=Imp|Person=2|VerbForm=Fin  0   root    0:root  Discourse=elaboration:58->57
2   a   a   DET DT  Definite=Ind|PronType=Art   3   det 3:det   Entity=(abstract-86
3   book    book    NOUN    NN  Number=Sing 1   obj 1:obj   _
4   /   /   PUNCT   SYM _   5   punct   5:punct _
5   novel   novel   NOUN    NN  Number=Sing 3   conj    1:obj|3:conj    Entity=abstract-86)
6   and and CCONJ   CC  _   7   cc  7:cc    Discourse=sequence:59->58
7   translate   translate   VERB    VB  Mood=Imp|Person=2|VerbForm=Fin  1   conj    1:conj:and  _
8   it  it  PRON    PRP Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs  7   obj 7:obj   Entity=(abstract-86)
9   to  to  ADP IN  _   12  case    12:case _
10  your    your    PRON    PRP$    Person=2|Poss=Yes|PronType=Prs  12  nmod:poss   12:nmod:poss    Entity=(abstract-4(person-5)
11  own own ADJ JJ  Degree=Pos  12  amod    12:amod _
12  language    language    NOUN    NN  Number=Sing 7   obl 7:obl:to    Entity=abstract-4)|SpaceAfter=No
13  .   .   PUNCT   .   _   1   punct   1:punct _
# sent_id = GUM_whow_languages-39
# s_type = imp
# text = Write your own poem / novel / story with your own made up language.
1   Write   write   VERB    VB  Mood=Imp|Person=2|VerbForm=Fin  0   root    0:root  Discourse=joint:65->54
2   your    your    PRON    PRP$    Person=2|Poss=Yes|PronType=Prs  4   nmod:poss   4:nmod:poss Entity=(abstract-90(person-5)
3   own own ADJ JJ  Degree=Pos  4   amod    4:amod  _
4   poem    poem    NOUN    NN  Number=Sing 1   obj 1:obj   _
5   /   /   PUNCT   SYM _   6   punct   6:punct _
6   novel   novel   NOUN    NN  Number=Sing 4   conj    1:obj|4:conj    _
7   /   /   PUNCT   SYM _   8   punct   8:punct _
8   story   story   NOUN    NN  Number=Sing 4   conj    1:obj|4:conj    Entity=abstract-90)
9   with    with    ADP IN  _   14  case    14:case _
10  your    your    PRON    PRP$    Person=2|Poss=Yes|PronType=Prs  14  nmod:poss   14:nmod:poss    Entity=(abstract-4(person-5)
11  own own ADJ JJ  Degree=Pos  14  amod    14:amod _
12  made    make    VERB    VBN Tense=Past|VerbForm=Part    14  acl 14:acl  _
13  up  up  ADP RP  _   12  compound:prt    12:compound:prt _
14  language    language    NOUN    NN  Number=Sing 8   nmod    8:nmod:with Entity=abstract-4)|SpaceAfter=No
15  .   .   PUNCT   .   _   1   punct   1:punct _

Commit: d38df82

amir-zeldes commented 3 years ago

Definitely, these are errors. Thanks!

muchang commented 3 years ago

Thanks, Amir. However, I am not sure about the following case:

# sent_id = GUM_bio_jerome-2
# s_type = decl
# text = Jerome (/ dʒəˈroʊm /; Latin: Eusebius Sophronius Hieronymus; Greek: Εὐσέβιος Σωφρόνιος Ἱερώνυμος; c. 347 – 30 September 420) was a Latin Catholic priest, confessor, theologian, and historian, commonly known as Saint Jerome.
1   Jerome  Jerome  PROPN   NNP Number=Sing 30  nsubj   30:nsubj|32:nsubj|34:nsubj|37:nsubj Discourse=ROOT:2|Entity=(person-1-Jerome)
2   (   (   PUNCT   -LRB-   _   4   punct   4:punct Discourse=elaboration:3->2|SpaceAfter=No
3   /   /   PUNCT   SYM _   4   punct   4:punct _
4   dʒəˈroʊm    dʒəˈroʊm    PROPN   NNP Number=Sing 1   appos   1:appos Entity=(person-1-Jerome)
5   /   /   PUNCT   SYM _   9   punct   9:punct SpaceAfter=No
6   ;   ;   PUNCT   :   _   9   punct   9:punct _
7   Latin   Latin   PROPN   NNP Number=Sing 9   nmod    9:nmod  Discourse=preparation:4->5|Entity=(abstract-2-Latin)|SpaceAfter=No
8   :   :   PUNCT   :   _   7   punct   7:punct _
9   Eusebius    Eusebius    PROPN   NNP Number=Sing 1   appos   1:appos Discourse=joint:5->3|Entity=(person-1-Jerome
10  Sophronius  Sophronius  PROPN   NNP Number=Sing 9   flat    9:flat  _
11  Hieronymus  Hieronymus  PROPN   NNP Number=Sing 9   flat    9:flat  SpaceAfter=No
12  ;   ;   PUNCT   :   _   15  punct   15:punct    Entity=person-1-Jerome)
13  Greek   Greek   PROPN   NNP Number=Sing 15  nmod    15:nmod Discourse=preparation:6->7|Entity=(abstract-3-Greek_language)|SpaceAfter=No
14  :   :   PUNCT   :   _   13  punct   13:punct    _
15  Εὐσέβιος    Εὐσέβιος    PROPN   NNP Number=Sing 1   appos   1:appos Discourse=joint:7->3|Entity=(person-1-Jerome
16  Σωφρόνιος   Σωφρόνιος   PROPN   NNP Number=Sing 15  flat    15:flat _
17  Ἱερώνυμος   Ἱερώνυμος   PROPN   NNP Number=Sing 15  flat    15:flat Entity=person-1-Jerome)|SpaceAfter=No
18  ;   ;   PUNCT   :   _   20  punct   20:punct    _
19  c.  c.  ADV FW  Abbr=Yes    20  advmod  20:advmod   Discourse=joint:8->3
20  347 347 NUM CD  NumForm=Digit|NumType=Card  1   nmod:tmod   1:nmod:tmod Entity=(time-4)
21  –   -   SYM SYM _   22  case    22:case _
22  30  30  NUM CD  NumForm=Digit|NumType=Card  20  nmod    20:nmod:to  Entity=(time-5
23  September   September   PROPN   NNP Number=Sing 22  compound    22:compound Entity=(time-6
24  420 420 NUM CD  NumForm=Digit|NumType=Card  22  nmod:tmod   22:nmod:tmod    Entity=(time-7)time-5)time-6)|SpaceAfter=No
25  )   )   PUNCT   -RRB-   _   20  punct   20:punct    _
26  was be  AUX VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   30  cop 30:cop  Discourse=same-unit:9->2
27  a   a   DET DT  Definite=Ind|PronType=Art   30  det 30:det  Entity=(person-1-Jerome
28  Latin   Latin   ADJ JJ  Degree=Pos  30  amod    30:amod _
29  Catholic    Catholic    ADJ JJ  Degree=Pos  30  amod    30:amod _
30  priest  priest  NOUN    NN  Number=Sing 0   root    0:root  SpaceAfter=No
31  ,   ,   PUNCT   ,   _   32  punct   32:punct    _
32  confessor   confessor   NOUN    NN  Number=Sing 30  conj    30:conj:and SpaceAfter=No
33  ,   ,   PUNCT   ,   _   34  punct   34:punct    _
34  theologian  theologian  NOUN    NN  Number=Sing 30  conj    30:conj:and SpaceAfter=No
35  ,   ,   PUNCT   ,   _   37  punct   37:punct    _
36  and and CCONJ   CC  _   37  cc  37:cc   _
37  historian   historian   NOUN    NN  Number=Sing 30  conj    30:conj:and Entity=person-1-Jerome)|SpaceAfter=No
38  ,   ,   PUNCT   ,   _   40  punct   40:punct    _
39  commonly    commonly    ADV RB  Degree=Pos  40  advmod  40:advmod   Discourse=elaboration:10->2
40  known   know    VERB    VBN Tense=Past|VerbForm=Part    30  acl 30:acl  _
41  as  as  ADP IN  _   42  case    42:case _
42  Saint   Saint   PROPN   NNP Number=Sing 40  obl 40:obl:as   Entity=(person-1-Jerome
43  Jerome  Jerome  PROPN   NNP Number=Sing 42  flat    42:flat Entity=person-1-Jerome)|SpaceAfter=No
44  .   .   PUNCT   .   _   30  punct   30:punct    _

The "/" in the following sentence seems to be SYM as well.

amir-zeldes commented 3 years ago

I think here it's just a graphic device delimiting a phonological transcription, so I would say it's PUNCT. This way its deprel can be punct. Otherwise, what would the deprel be? The UD guidelines suggest that SYM "can be substituted by normal words", which I don't think is the case here: https://universaldependencies.org/u/pos/SYM.html

muchang commented 3 years ago

Yeah, I agree with you.