UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

VERB+NNP in proper nouns should be ADJ+NNP #417

Closed rhdunn closed 10 months ago

rhdunn commented 12 months ago

The part in a noun phrase should be an adjective (ADJ+NNP), not a verb (VERB+NNP):

$ grep -U1 -P "VERB\tNNP" *.conllu
en_ewt-ud-dev.conllu-2  interim interim ADJ     JJ      Degree=Pos      4       amod    4:amod  _
en_ewt-ud-dev.conllu:3  Governing       Govern  VERB    NNP     VerbForm=Ger    4       amod    4:amod  _
en_ewt-ud-dev.conllu-4  Council Council PROPN   NNP     Number=Sing     5       nsubj   5:nsubj _
--
en_ewt-ud-dev.conllu-7  ,       ,       PUNCT   ,       _       9       punct   9:punct _
en_ewt-ud-dev.conllu:8  Applied Apply   VERB    NNP     Tense=Past|VerbForm=Part        9       amod    9:amod  _
en_ewt-ud-dev.conllu-9  Semantics       Semantics       PROPN   NNPS    Number=Plur     4       conj    3:obj|4:conj:and        SpaceAfter=No
--
en_ewt-ud-dev.conllu-3  BBC     BBC     PROPN   NNP     Number=Sing     6       compound        6:compound      _
en_ewt-ud-dev.conllu:4  Breaking        Break   VERB    NNP     VerbForm=Ger    5       amod    5:amod  _
en_ewt-ud-dev.conllu-5  News    News    PROPN   NNP     Number=Sing     6       compound        6:compound      _
--
en_ewt-ud-dev.conllu-5  BBC     BBC     PROPN   NNP     Number=Sing     8       compound        8:compound      _
en_ewt-ud-dev.conllu:6  Breaking        Break   VERB    NNP     VerbForm=Ger    7       amod    7:amod  _
en_ewt-ud-dev.conllu-7  News    News    PROPN   NNP     Number=Sing     8       compound        8:compound      _
--
en_ewt-ud-train.conllu-9        form    form    NOUN    NN      Number=Sing     1       nsubj:pass      1:nsubj:pass    _
en_ewt-ud-train.conllu:10       Deemed  Deem    VERB    NNP     Tense=Past|VerbForm=Part        11      amod    11:amod _
en_ewt-ud-train.conllu-11       ISDA    ISDA    PROPN   NNP     Number=Sing     9       appos   9:appos _
--
en_ewt-ud-train.conllu-13       Polykron        Polykron        PROPN   NNP     Number=Sing     11      conj    11:conj:and|15:compound _
en_ewt-ud-train.conllu:14       Deemed  Deem    VERB    NNP     Tense=Past|VerbForm=Part        15      amod    15:amod _
en_ewt-ud-train.conllu-15       ISDAs   ISDA    PROPN   NNPS    Number=Plur     8       nmod    8:nmod:of       _
--
en_ewt-ud-train.conllu-27       the     the     DET     DT      Definite=Def|PronType=Art       29      det     29:det  _
en_ewt-ud-train.conllu:28       Rolling Roll    VERB    NNP     VerbForm=Ger    29      amod    29:amod _
en_ewt-ud-train.conllu-29       Stones  Stone   PROPN   NNPS    Number=Plur     25      nmod    25:nmod:of      _
--
en_ewt-ud-train.conllu-118      "       "       PUNCT   ``      _       120     punct   120:punct       SpaceAfter=No
en_ewt-ud-train.conllu:119      Rolling Roll    VERB    NNP     VerbForm=Ger    120     amod    120:amod        _
en_ewt-ud-train.conllu-120      Stone   Stone   PROPN   NNP     Number=Sing     122     compound        122:compound    SpaceAfter=No
--
en_ewt-ud-train.conllu-50       "       "       PUNCT   ``      _       52      punct   52:punct        SpaceAfter=No
en_ewt-ud-train.conllu:51       Rolling Roll    VERB    NNP     VerbForm=Ger    52      amod    52:amod _
en_ewt-ud-train.conllu-52       Stone   Stone   PROPN   NNP     Number=Sing     54      compound        54:compound     SpaceAfter=No
--
en_ewt-ud-train.conllu-23       -       -       PUNCT   ,       _       22      punct   22:punct        SpaceAfter=No
en_ewt-ud-train.conllu:24       Rolling Roll    VERB    NNP     VerbForm=Ger    25      amod    25:amod _
en_ewt-ud-train.conllu-25       Stones  Stone   PROPN   NNPS    Number=Plur     22      conj    20:appos|22:conj        SpaceAfter=No
--
en_ewt-ud-train.conllu-12       the     the     DET     DT      Definite=Def|PronType=Art       14      det     14:det  _
en_ewt-ud-train.conllu:13       Rolling Roll    VERB    NNP     VerbForm=Ger    14      amod    14:amod _
en_ewt-ud-train.conllu-14       Stones  Stone   PROPN   NNPS    Number=Plur     10      conj    6:nmod:such_as|10:conj:and      _
--
en_ewt-ud-train.conllu-80       the     the     DET     DT      Definite=Def|PronType=Art       82      det     82:det  _
en_ewt-ud-train.conllu:81       Rolling Roll    VERB    NNP     VerbForm=Ger    82      amod    82:amod _
en_ewt-ud-train.conllu-82       Stones  Stone   PROPN   NNPS    Number=Plur     78      conj    76:appos|78:conj:and    SpaceAfter=No
--
en_ewt-ud-train.conllu-33       and     and     CCONJ   CC      _       35      cc      35:cc   _
en_ewt-ud-train.conllu:34       Rolling Roll    VERB    NNP     VerbForm=Ger    35      amod    35:amod _
en_ewt-ud-train.conllu-35       Stones  Stone   PROPN   NNPS    Number=Plur     32      conj    26:obl:to|32:conj:and   SpaceAfter=No
--
en_ewt-ud-train.conllu-11       and     and     CCONJ   CC      _       13      cc      13:cc   _
en_ewt-ud-train.conllu:12       baked   bake    VERB    NNP     Tense=Past|VerbForm=Part        13      amod    13:amod _
en_ewt-ud-train.conllu-13       snake   Snake   PROPN   NNP     Number=Sing     10      conj    7:nmod:like|10:conj:and SpaceAfter=No
--
en_ewt-ud-train.conllu-16       the     the     DET     DT      Definite=Def|PronType=Art       19      det     19:det  _
en_ewt-ud-train.conllu:17       Flying  Fly     VERB    NNP     VerbForm=Ger    18      amod    18:amod _
en_ewt-ud-train.conllu-18       Squirrels       Squirrel        PROPN   NNPS    Number=Plur     19      compound        19:compound     _
nschneid commented 12 months ago

Why do you think VERB is incorrect? (The line between participles and adjectives is very hard to draw in English.)

rhdunn commented 12 months ago

From https://en.wikipedia.org/wiki/Adjective#Order:

  1. Qualifier/purpose – final limiter, which sometimes forms part of the (compound) noun (e.g., rocking chair, hunting cabin, passenger car, book cover)

Note that "United" in noun phrases like "United States", "United Kindom", and "United Air" are classified in this treebank as ADJ, not VERB. "Looking" is also ADJ in this treebank.

For the ... noun, the part of speech in the ... is an adjective not a verb, as it isn't defining an action ("the stone is rolling down the hill"), but is defining the name of a magazine -- the "Rolling Stone".

And if it is a particle, shouldn't it be PART+NNP, not VERB+NNP?

nschneid commented 12 months ago

And if it is a particle, shouldn't it be PART+NNP, not VERB+NNP?

Participle, not particle.

I don't think the semantics offers a good test. In UPOS (unlike PTB/XPOS) we don't distinguish names from non-names except nouns within names (PROPN) vs. other nouns (NOUN).

The best test I can think of is that "very" can modify some adjectives, but not verbs. You can say people are very united on something. I don't think you can say "the news is very breaking".

Note that "United" in noun phrases like "United States", "United Kindom", and "United Air" are classified in this treebank as ADJ, not VERB. "Looking" is also ADJ in this treebank.

I suspect Looking/ADJ is an error.

rhdunn commented 12 months ago

I see. So per https://github.com/globalwordnet/english-wordnet/issues/953 you have (for compound nouns annotated with NNP):

  1. adjectives relating to present participle verbs (looking, rocking, breaking, etc.) are being annotated as VERB, while
  2. those relating to past participle verbs (united, haunted, etc.) or other relating to other verb/noun forms (deniable, stateless, faithful, salty, etc.) are being annotated as ADJ?

Where in this case "looking" is an error as you state.

nschneid commented 12 months ago

I don't think the type of participle determines VERB vs. ADJ in general (outside of names there are plenty of -ing ADJs) but perhaps the present participles are less likely to become adjectives than past participles.

Note there are other issues involving participles: #102, #355

arademaker commented 12 months ago

Hi @rhdunn, I needed help understanding the link to https://github.com/globalwordnet/english-wordnet/issues/953. There, we are discussing the ADJ organization in the English Wordnet, but we still eventually touch on the deeper problem of what should count as ADJ or not in this branch of the Princeton Wordnet. Indeed, we are all aware of the initial motivations, but some can be revised.

rhdunn commented 12 months ago

@arademaker The link was regarding the "relates an adjective to a verb" and "relates an adjective to a noun" parts of the different proposed relations. I was speculating on what the EWT tagging rules were based on these adjective classifications.

That is specifically regarding the "present" adjectives (based on the present participle form of a verb), where it looks like EWT classifies that as the underlying VERB (with NNP for the PTB XPOS per the noun phrase rules for compound nouns).

Likewise, for "resultant" adjectives (based on the past participle form of a verb), it looks like EWT is classifying these as ADJ (with NNP for the PTB XPOS per the noun phrase rules for compound nouns).