UniversalDependencies / UD_English-GUM

Other
32 stars 4 forks source link

Question about Past Participle and ADJ #33

Closed muchang closed 3 years ago

muchang commented 3 years ago

For the following sentence, the past participle of the verb is also an ADJ. What should we tag for such a case?

Word: "known"

# sent_id = GUM_bio_moreau-36
# s_type = decl
# text = Moreau went on to work with many of the best known New Wave and avant-garde directors. [2]
1   Moreau  Moreau  PROPN   NNP Number=Sing 2   nsubj   2:nsubj|5:nsubj:xsubj   Discourse=sequence:101->84|Entity=(person-1-Jeanne_Moreau)
2   went    go  VERB    VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   0   root    0:root  _
3   on  on  ADP RP  _   2   compound:prt    2:compound:prt  _
4   to  to  PART    TO  _   5   mark    5:mark  _
5   work    work    VERB    VB  VerbForm=Inf    2   xcomp   2:xcomp _
6   with    with    ADP IN  _   7   case    7:case  _
7   many    many    ADJ JJ  Degree=Pos  5   obl 5:obl:with  Entity=(person-163
8   of  of  ADP IN  _   16  case    16:case _
9   the the DET DT  Definite=Def|PronType=Art   16  det 16:det  Entity=(person-164
10  best    well    ADV RBS Degree=Sup  11  advmod  11:advmod   _
11  known   know    VERB    VBN Tense=Past|VerbForm=Part    16  amod    16:amod _
12  New New ADJ NNP Degree=Pos  13  amod    13:amod Entity=(abstract-165-French_New_Wave
13  Wave    Wave    PROPN   NNP Number=Sing 16  compound    16:compound Entity=abstract-165-French_New_Wave)
14  and and CCONJ   CC  _   15  cc  15:cc   _
15  avant-garde avant-garde ADJ JJ  Degree=Pos  13  conj    13:conj:and|16:amod _
16  directors   director    NOUN    NNS Number=Plur 7   nmod    7:nmod:of   Entity=person-163)person-164)|SpaceAfter=No
17  .   .   PUNCT   .   _   2   punct   2:punct _
18  [   [   PUNCT   -LRB-   _   19  punct   19:punct    Discourse=evidence:102->101|SpaceAfter=No
19  2   2   NUM CD  NumForm=Digit|NumType=Card  2   dep 2:dep   Entity=(abstract-95)|SpaceAfter=No
20  ]   ]   PUNCT   -RRB-   _   19  punct   19:punct    _

Word: "frozen"

# sent_id = GUM_voyage_fortlee-53
# s_type = decl
# text = There are also many nail salons, frozen yogurt shops, coffee shops and gas stations.
1   There   there   PRON    EX  _   2   expl    2:expl  Discourse=joint:103->101
2   are be  VERB    VBP Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   0   root    0:root  _
3   also    also    ADV RB  _   2   advmod  2:advmod    _
4   many    many    ADJ JJ  Degree=Pos  6   amod    6:amod  Entity=(place-167
5   nail    nail    NOUN    NN  Number=Sing 6   compound    6:compound  _
6   salons  salon   NOUN    NNS Number=Plur 2   nsubj   2:nsubj Entity=place-167)|SpaceAfter=No
7   ,   ,   PUNCT   ,   _   10  punct   10:punct    _
8   frozen  freeze  VERB    VBN Tense=Past|VerbForm=Part    10  amod    10:amod Entity=(place-168
9   yogurt  yogurt  NOUN    NN  Number=Sing 10  compound    10:compound _
10  shops   shop    NOUN    NNS Number=Plur 6   conj    2:nsubj|6:conj:and  Entity=place-168)|SpaceAfter=No
11  ,   ,   PUNCT   ,   _   13  punct   13:punct    _
12  coffee  coffee  NOUN    NN  Number=Sing 13  compound    13:compound Entity=(place-169
13  shops   shop    NOUN    NNS Number=Plur 6   conj    2:nsubj|6:conj:and  Entity=place-169)
14  and and CCONJ   CC  _   16  cc  16:cc   _
15  gas gas NOUN    NN  Number=Sing 16  compound    16:compound Entity=(place-170
16  stations    station NOUN    NNS Number=Plur 6   conj    2:nsubj|6:conj:and  Entity=place-170)|SpaceAfter=No
17  .   .   PUNCT   .   _   2   punct   2:punct _

Commit: d38df82

amir-zeldes commented 3 years ago

These seem pretty transparent: the best known thing is the thing people know best, and frozen yogurt is literally frozen. The PTB tests for this are mixed: on the one hand adding a by-phrase is possible, on the other, it is a bit odd: best known (by people), yogurt frozen (by the store). But notice that at least for known, we get the expected adverbial modification as seen in a VP: "well known" <> "know well" (compare "very famous" but "?? well famous"). @nschneid any opinions on these?

nschneid commented 3 years ago

"frozen" feels like an ADJ. Not sure about "known".

muchang commented 3 years ago

For your reference, "known" is tagged as ADJ in the following sentence:

# sent_id = GUM_fiction_honour-3
# s_type = sub
# text = An empire whose costly mistakes would for many years to come echo into every corner of the known galaxy.
1   An  a   DET DT  Definite=Ind|PronType=Art   2   det 2:det   Discourse=background:3->2|Entity=(place-5
2   empire  empire  NOUN    NN  Number=Sing 0   root    0:root  _
3   whose   whose   PRON    WP$ Poss=Yes|PronType=Rel   5   nmod:poss   5:nmod:poss Discourse=elaboration:4->3|Entity=(abstract-6
4   costly  costly  ADJ JJ  Degree=Pos  5   amod    5:amod  _
5   mistakes    mistake NOUN    NNS Number=Plur 12  nsubj   12:nsubj    Entity=abstract-6)
6   would   would   AUX MD  VerbForm=Fin    12  aux 12:aux  _
7   for for ADP IN  _   9   case    9:case  _
8   many    many    ADJ JJ  Degree=Pos  9   amod    9:amod  Entity=(time-7
9   years   year    NOUN    NNS Number=Plur 12  obl 12:obl:for  Entity=time-7)
10  to  to  PART    TO  _   11  mark    11:mark _
11  come    come    VERB    VB  VerbForm=Inf    9   acl 9:acl:to    _
12  echo    echo    VERB    VB  VerbForm=Inf    2   acl:relcl   2:acl:relcl _
13  into    into    ADP IN  _   15  case    15:case _
14  every   every   DET DT  _   15  det 15:det  Entity=(place-8
15  corner  corner  NOUN    NN  Number=Sing 12  obl 12:obl:into _
16  of  of  ADP IN  _   19  case    19:case _
17  the the DET DT  Definite=Def|PronType=Art   19  det 19:det  Entity=(place-9
18  known   known   ADJ JJ  Degree=Pos  19  amod    19:amod _
19  galaxy  galaxy  NOUN    NN  Number=Sing 15  nmod    15:nmod:of  Entity=place-5)place-8)place-9)|SpaceAfter=No
20  .   .   PUNCT   .   _   2   punct   2:punct _
amir-zeldes commented 3 years ago

This instance of "known" feels a little different to me, maybe because "the known galaxy" is like "the known world", which is a collocation, unlike "best known ... directors". What I mean is, it's no longer transparently "the galaxy that is known by everyone".

Looking at existing corpora, PTB has:

So the 'famous' sense is JJ there. Looking at OntoNotes, there are four more VERB and two ADJ:

VERB:

ADJ:

So if we disregard sense, VERB has the majority. If we accept multiple senses, maybe the current situation is OK, at least if we use the by-test from Santorini 1990:

arademaker commented 3 years ago

I am trying to find more information about the by-test, is it in https://www.nilsreiter.de/assets/2019-09-06-reflected-text-analysis/Penn-Treebank-Tagset.pdf?

amir-zeldes commented 3 years ago

Yes, it's mentioned in the PDF you linked on p.16 for actually realized by-phrase, and on p.17 for a hypothetically insertable by-phrase. It's also explicitly taught to GUM annotators, so I'm sure it's used as part of the argumentation for many edge cases in GUM.

muchang commented 3 years ago

I see, according to the by-test, the "known" in "known galaxy” should be a JJ.

amir-zeldes commented 3 years ago

OK, the other recently fixed issues should now be corrected in the UD repo dev branch as well.