UniversalDependencies / UD_Faroese-OFT

1 stars 2 forks source link

Clefts as juxtaposition #7

Closed ftyers closed 6 years ago

ftyers commented 6 years ago
# text = Tað er í Nýggja Testamenti, vit lesa um Jesus.
# text[eng] = It is in the New Testament that we read about Jesus.
# labels = to_check cleft incomplete
        "tað" Pron Pers Sg3 Nom @nsubj #1->
        "vera" V Ind Prs Sg3 @cop #2->
        "í" Pr @case #3->
        "nýggjur" A Neu Sg Dat Def @amod #4->5
        "testamenti" N Neu Sg Dat Indef @nmod #5->
        "," CLB @punct #6->
        "vit" Pron Pers Pl1 Nom @nsubj #7->
        "lesa" V Ind Prs Pl @dep #8->
        "um" Pr @case #9->10
        "Jesus" N Prop Sem/Mal Sg Acc @nmod #10->
        "." CLB @punct #11->

There is also some relevant discussion here.


The focussed thing is root, with the rest treated as a relative clause.

# sent_id =  015676
# text = Det er hun som eier og driver stedet.
1       Det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   3       expl    _       _
2       er      være    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        3       cop     _       _
3       hun     hun     PRON    _       Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs       0       root    _       _
4       som     som     PRON    _       PronType=Rel    5       nsubj   _       _
5       eier    eie     VERB    _       Mood=Ind|Tense=Pres|VerbForm=Fin        3       acl:relcl       _       _
6       og      og      CCONJ   _       _       7       cc      _       _
7       driver  drive   VERB    _       Mood=Ind|Tense=Pres|VerbForm=Fin        5       conj    _       _
8       stedet  sted    NOUN    _       Definite=Def|Gender=Neut|Number=Sing    7       obj     _       SpaceAfter=No
9       .       $.      PUNCT   _       _       3       punct   _       _


As Bokmål:

# sent_id =  000794
# text = - Det er ressursane som gjer at Danmark gjennom heile historia har hatt langt høgare folketal enn Noreg:
1       -       $-      PUNCT   _       _       4       punct   _       _
2       Det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   4       expl    _       _
3       er      vere    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        4       cop     _       _
4       ressursane      ressurs NOUN    _       Definite=Def|Gender=Masc|Number=Plur    0       root    _       _
5       som     som     PRON    _       PronType=Rel    6       nsubj   _       _
6       gjer    gjere   VERB    _       Mood=Ind|Tense=Pres|VerbForm=Fin        4       acl:relcl       _       _
7       at      at      SCONJ   _       _       13      mark    _       _
8       Danmark Danmark PROPN   _       _       13      nsubj   _       _
9       gjennom gjennom ADP     _       _       11      case    _       _
10      heile   heil    ADJ     _       Definite=Def|Degree=Pos|Number=Sing     11      amod    _       _
11      historia        historie        NOUN    _       Definite=Def|Gender=Fem|Number=Sing     13      obl     _       _
12      har     ha      AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin        13      aux     _       _
13      hatt    ha      VERB    _       VerbForm=Part   6       ccomp   _       _
14      langt   lang    ADJ     _       Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 15      advmod  _       _
15      høgare  høg     ADJ     _       Degree=Cmp      16      amod    _       _
16      folketal        folketal        NOUN    _       Definite=Ind|Gender=Neut|Number=Sing    13      obj     _       _
17      enn     enn     ADP     _       _       18      case    _       _
18      Noreg   Noreg   PROPN   _       _       15      obl     _       SpaceAfter=No
19      :       $:      PUNCT   _       _       4       punct   _       _


The focussed thing is dislocated with the rest as a relative clause.

# sent_id = sv-ud-train-4277
# text = Det är här som diskussionen kört fast.
1       Det     den     PRON    PN|NEU|SIN|DEF|SUB/OBJ  Definite=Def|Gender=Neut|Number=Sing|PronType=Prs       2       expl    _       _
2       är      vara    VERB    VB|PRS|AKT      Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act      0       root    _       _
3       här     här     ADV     AB      _       2       dislocated      _       _
4       som     som     ADV     HA      _       6       advmod  _       _
5       diskussionen    diskussion      NOUN    NN|UTR|SIN|DEF|NOM      Case=Nom|Definite=Def|Gender=Com|Number=Sing    6       nsubj   _       _
6       kört    köra    VERB    VB|SUP|AKT      VerbForm=Sup|Voice=Act  3       acl:relcl       _       _
7       fast    fast    ADV     PL      _       6       compound:prt    _       SpaceAfter=No
8       .       .       PUNCT   MAD     _       2       punct   _       _


Danish has the "det" as nsubj not as expl, but otherwise like Norwegian.

# sent_id = train-v2-140
# text = - Det er min mand , der har fundet på det .
1       -       -       PUNCT   _       _       5       punct   _       _
2       Det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   5       nsubj   _       _
3       er      være    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act      5       cop     _       _
4       min     min     DET     _       Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs 5       det     _       _
5       mand    mand    NOUN    _       Definite=Ind|Gender=Com|Number=Sing     0       root    _       _
6       ,       ,       PUNCT   _       _       5       punct   _       _
7       der     der     PRON    _       PartType=Inf    9       nsubj   _       _
8       har     have    AUX     _       Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act      9       aux     _       _
9       fundet  finde   VERB    _       Definite=Ind|Number=Sing|Tense=Past|VerbForm=Part       2       acl:relcl       _       _
10      på      på      ADP     _       AdpType=Prep    11      case    _       _
11      det     det     PRON    _       Gender=Neut|Number=Sing|Person=3|PronType=Prs   9       obl     _       _
12      .       .       PUNCT   _       _       5       punct   _       _
liljao commented 6 years ago

In version 2.2 of the Norwegian and Swedish treebanks we have harmonized our analyses. It has the focused element as root with expl and cop daughters. We have also introduced a subtype of acl:cleft to mark these constructions:

# sent_id =  015676
# text = Det er hun som eier og driver stedet.
1   Det det PRON    _   Gender=Neut|Number=Sing|Person=3|PronType=Prs   3   expl    _   _
2   er  være    AUX _   Mood=Ind|Tense=Pres|VerbForm=Fin    3   cop _   _
3   hun hun PRON    _   Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs   0   root    _   _
4   som som PRON    _   PronType=Rel    5   nsubj   _   _
5   eier    eie VERB    _   Mood=Ind|Tense=Pres|VerbForm=Fin    3   acl:cleft   _   _
6   og  og  CCONJ   _   _   7   cc  _   _
7   driver  drive   VERB    _   Mood=Ind|Tense=Pres|VerbForm=Fin    5   conj    _   _
8   stedet  sted    NOUN    _   Definite=Def|Gender=Neut|Number=Sing    7   obj _   SpaceAfter=No
9   .   $.  PUNCT   _   _   3   punct   _   _
ftyers commented 6 years ago

@liljao that's great thanks, I'll use that analysis, and file an issue with the Danish treebank.

captura de 2018-04-06 12-09-11