UniversalDependencies / UD_Faroese-OFT

Other
1 stars 2 forks source link

Measure expressions with adjectives #4

Closed ftyers closed 6 years ago

ftyers commented 6 years ago

What to do when there is a construction like Num hours/minutes/feet long?

# text = Filmurin er gott 2 tímar langur.
# text[eng] = The film is a good two hours long.
# labels = to_check incomplete
"<Filmurin>"
        "filmur" N Msc Sg Nom Def @nsubj #1->6
"<er>"
        "vera" V Ind Prs Sg3 @cop #2->6
"<gott>"
        "góður" A Adv @advmod #3->5
"<2>"
        "2" Num Nom @nummod #4->5
"<tímar>"
        "tími" N Msc Pl Nom Indef @dep #5->
"<langur>"
        "langur" A Msc Sg Nom Indef @root #6->0
"<.>"
        "." CLB @punct #7->6

Bokmål

The measure expression is obl to "long".

# sent_id =  013344
# text = - For de fleste barna er skolevegen tre kilometer lang.
1       -       $-      PUNCT   _       _       10      punct   _       _
2       For     for     ADP     _       _       5       case    _       _
3       de      de      DET     _       Number=Plur|PronType=Dem        5       det     _       _
4       fleste  mange   ADJ     _       Definite=Def|Degree=Sup 5       amod    _       _
5       barna   barn    NOUN    _       Definite=Def|Gender=Neut|Number=Plur    10      obl     _       _
6       er      være    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        10      cop     _       _
7       skolevegen      skoleveg        NOUN    _       Definite=Def|Gender=Masc|Number=Sing    10      nsubj   _       _
8       tre     tre     NUM     _       Number=Plur|NumType=Card        9       nummod  _       _
9       kilometer       kilometer       NOUN    _       Definite=Ind|Gender=Masc|Number=Plur    10      obl     _       _
10      lang    lang    ADJ     _       Definite=Ind|Degree=Pos|Number=Sing     0       root    _       SpaceAfter=No
11      .       $.      PUNCT   _       _       10      punct   _       _

Nynorsk

The measure expression is obl to "long".

# sent_id =  006624
# text = - Det er ein hannbjørn på om lag 1,70 meter lang frå snute til haletipp, så det er ikkje noko stort dyr.
1       -       $-      PUNCT   _       _       5       punct   _       _
2       Det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   5       nsubj   _       _
3       er      vere    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        5       cop     _       _
4       ein     ein     DET     _       Gender=Masc|Number=Sing|PronType=Art    5       det     _       _
5       hannbjørn       hannbjørn       NOUN    _       Definite=Ind|Gender=Masc|Number=Sing    0       root    _       _
6       på      på      ADP     _       _       11      case    _       _
7       om      om      ADP     _       _       8       case    _       _
8       lag     lag     NOUN    _       _       9       obl     _       _
9       1,70    1,70    NUM     _       Number=Plur|NumType=Card        10      nummod  _       _
10      meter   meter   NOUN    _       Definite=Ind|Gender=Masc|Number=Plur    11      obl     _       _
11      lang    lang    ADJ     _       Definite=Ind|Degree=Pos|Number=Sing     5       amod    _       _
12      frå     frå     ADP     _       _       13      case    _       _
13      snute   snute   NOUN    _       Definite=Ind|Gender=Masc|Number=Sing    11      obl     _       _
14      til     til     ADP     _       _       15      case    _       _
15      haletipp        haletipp        NOUN    _       Definite=Ind|Gender=Masc|Number=Sing    13      obl     _       SpaceAfter=No
16      ,       $,      PUNCT   _       _       5       punct   _       _
17      så      så      CCONJ   _       _       23      cc      _       _
18      det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   23      nsubj   _       _
19      er      vere    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        23      cop     _       _
20      ikkje   ikkje   ADV     _       Polarity=Neg    23      advmod  _       _
21      noko    nokon   DET     _       Gender=Neut|Number=Sing|PronType=Ind    23      det     _       _
22      stort   stor    ADJ     _       Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 23      amod    _       _
23      dyr     dyr     NOUN    _       Definite=Ind|Gender=Neut|Number=Sing    5       conj    _       SpaceAfter=No
24      .       $.      PUNCT   _       _       5       punct   _       _

Danish

Measure expression is obl to "lang".

# sent_id = train-v2-826
# text = Én gang måtte han have hjælp på den 2.000 km lange trip .
1       Én      en      DET     _       Gender=Com|Number=Sing|PronType=Ind     2       det     _       _
2       gang    gang    NOUN    _       Definite=Ind|Gender=Com|Number=Sing     5       obl     _       _
3       måtte   måtte   AUX     _       Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act      5       aux     _       _
4       han     han     PRON    _       Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs   5       nsubj   _       _
5       have    have    VERB    _       VerbForm=Inf|Voice=Act  0       root    _       _
6       hjælp   hjælp   NOUN    _       Definite=Ind|Gender=Com|Number=Sing     5       obj     _       _
7       på      på      ADP     _       AdpType=Prep    11      case    _       _
8       den     den     DET     _       Gender=Com|Number=Sing|PronType=Dem     11      det     _       _
9       2.000   2.000   NUM     _       NumType=Card    10      nummod  _       _
10      km      kilometer       NOUN    _       Definite=Ind|Gender=Com|Number=Plur     11      obl     _       _
11      lange   lang    ADJ     _       Definite=Def|Degree=Pos|Number=Sing     5       obl     _       _
12      trip    trip    X       _       Foreign=Yes     11      obl     _       _
13      .       .       PUNCT   _       _       5       punct   _       _

Swedish

I couldn't find any examples with adjectives, but did find this with an adverb, the measure expression again is obl.

# sent_id = sv-ud-train-2482
# text = Filen upphör 150 meter söder om Ringvägen.
1       Filen   fil     NOUN    NN|UTR|SIN|DEF|NOM      Case=Nom|Definite=Def|Gender=Com|Number=Sing    2       nsubj   _       _
2       upphör  upphöra VERB    VB|PRS|AKT      Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act      0       root    _       _
3       150     150     NUM     RG|NOM  Case=Nom|NumType=Card   4       nummod  _       _
4       meter   meter   NOUN    NN|UTR|PLU|IND|NOM      Case=Nom|Definite=Ind|Gender=Com|Number=Plur    5       obl     _       _
5       söder   söder   ADV     AB      _       6       advmod  _       _
6       om      om      ADP     PP      _       7       case    _       _
7       Ringvägen       Ringvägen       PROPN   PM|NOM  Case=Nom        2       obl     _       SpaceAfter=No
8       .       .       PUNCT   MAD     _       2       punct   _       _

German

German has nmod (but also nmod for the numeral which looks wrong)

# sent_id = train-s13913
# text = Strandgrundeln werden sechs bis maximal neun Zentimeter lang.
1       Strandgrundeln  Strandgrundeln  NOUN    NN      Case=Nom|Number=Sing    8       nsubj   _       _
2       werden  werden  AUX     VAFIN   Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   8       cop     _       _
3       sechs   sechs   NUM     CARD    NumType=Card    7       nummod  _       _
4       bis     bis     ADP     KON     _       6       case    _       _
5       maximal maximal ADJ     ADJD    Degree=Pos      6       advmod  _       _
6       neun    neun    NUM     CARD    NumType=Card    7       nmod    _       _
7       Zentimeter      Zentimeter      NOUN    NN      _       8       nmod    _       _
8       lang    lang    ADJ     ADJD    Degree=Pos      0       root    _       SpaceAfter=No
9       .       .       PUNCT   $.      _       8       punct   _       _

Seems that German is the odd one out here, and we should go with obl.

jnivre commented 6 years ago

I think nummod + obl is the best option here, which also seems to be the majority decision. An argument for obl, rather than nmod, is that we would use advmod, not amod, for "very" in "very long". Hence, "3 hours" in "3 hours long" is adverbial, not adnominal, and obl should be used instead of nmod.

dan-zeman commented 6 years ago

I agree with @jnivre.

This was not very clear from the early discussion about the v2 guidelines (at least not to me) but the borderline between obl and nmod has verbs AND adjectives AND adverbs as parents on the left side, and nouns on the right. (Although it is more complex if a noun functions as a predicate.)

gossebouma commented 6 years ago

Dutch follows the majority of using nummod + obl ;-)

ftyers commented 6 years ago

Noted and I've left an issue at the German treebank issues page. Thanks all! :)

amir-zeldes commented 6 years ago

FYI I think in English-GUM we still have nmod:npmod which goes back to Stanford Dependencies npadvmod. I can change it but I assume this won't be in the upcoming version due to data freeze. @dan-zeman shall I commit it to dev or wait until after the release?

dan-zeman commented 6 years ago

@amir-zeldes If you can change it now, please do so; I still have not done the final pull. And I suppose English-EWT uses obl (I did not check though), so it would be a bit more harmonization for the shared task.

amir-zeldes commented 6 years ago

OK, I can do that, just need to validate again. Should be done by end of day.

amir-zeldes commented 6 years ago

OK, done in UniversalDependencies/UD_English-GUM@b03e117843999fe8a72793bcc4c79af24912c834

BTW I checked and EWT does indeed use obl:npmod, so now they're hopefully the same!