UniversalDependencies / UD_English-PUD

Parallel Universal Dependencies.
Other
10 stars 2 forks source link

Numbered wars #44

Open AngledLuffa opened 9 months ago

AngledLuffa commented 9 months ago

It seems weird that we have

# sent_id = w01096074
10      World   world   PROPN   NN      Number=Sing     11      compound        11:compound     _
11      War     war     PROPN   NN      Number=Sing     8       nmod    8:nmod:of       _
12      I       I       NUM     CD      NumForm=Roman|NumType=Card      11      compound        11:compound     _

but then also have

# sent_id = w01100046
9       First   First   PROPN   NNP     Number=Sing     11      compound        11:compound     _
10      Opium   Opium   PROPN   NNP     Number=Sing     11      compound        11:compound     _
11      War     War     PROPN   NNP     Number=Sing     7       obj     7:obj   _

and

25      Second  second  ADJ     JJ      Degree=Pos|NumForm=Word|NumType=Ord     27      amod    27:amod Proper=True
26      World   world   PROPN   NN      Number=Sing     27      compound        27:compound     _
27      War     war     PROPN   NN      Number=Sing     22      nmod    22:nmod:of      SpaceAfter=No
rhdunn commented 9 months ago

Example 2 and 3 are ordinal number words, so should be XPOS=CD with NumForm=Word|NumType=Ord according to UD guidelines.

IIRC, NNP is used in the XPOS for compatibility with PTB. In this case, example 3 should match example 2. This gives a conflicting XPOS candidate (CD or NNP).

The cambridge dictionary classifies the ordinals as determiners (but notes that another determiner like "the" or "a" can preceed the ordinal):

  1. https://dictionary.cambridge.org/grammar/british-grammar/number

However, wiktionary classifies them as adjectives:

  1. https://en.wiktionary.org/wiki/first#Adjective

Wikipedia doesn't mention ordinals as adjectives in the adjective order page:

  1. https://en.wikipedia.org/wiki/Adjective#Order

But Wikipedia seems to agree with the Cambridge dictionary and not wiktionary on that page:

Determiners and postdeterminers—articles, numerals, and other limiters (e.g. three blind mice)—come before attributive adjectives in English.

nschneid commented 9 months ago

Ordinal numbers should be ADJ: https://universaldependencies.org/u/pos/ADJ.html

AngledLuffa commented 9 months ago

So First_ADJ Opium War? NNP or JJ for the xpos?

nschneid commented 9 months ago

My hunch is NNP

AngledLuffa commented 9 months ago

World and War both NNP? It looks very weird having them be PROPN but NN

nschneid commented 9 months ago

In "World War I", definitely "World" and "War" are NNP. I would lean that way also for "First World War", and that seems to be consistent with OntoNotes.

AngledLuffa commented 9 months ago

NNP or JJ for the xpos?

My hunch is NNP

Worth pointing out that in GUM, the 2002 World Cup gets the tag CD (not NNP). However, it might be considered not actually part of the name, I suppose.

AngledLuffa commented 9 months ago

... although they later annotate

Instruments for Research into Second Languages (IRIS) Second City

with Second_NNP

AngledLuffa commented 9 months ago

How do the changes here look?

https://github.com/UniversalDependencies/UD_English-PUD/commit/1a81cda91eb8df2fcff09ef9a3553aedb43642f7

nschneid commented 9 months ago

How do the changes here look?

1a81cda

LGTM

amir-zeldes commented 9 months ago

2002 World Cup gets the tag CD (not NNP)

I think that's canon, let me know if someone wants to argue it's not?