clarinsi / jos2ud

1 stars 0 forks source link

Add SSJ-UD-2.5+ annotations to the official ssj500k #13

Closed kajad closed 3 years ago

kajad commented 3 years ago

ssj500k v2.2 (http://hdl.handle.net/11356/1210) currently includes SSJ-UD data v2.4. As notified in an email on 19/11/2019, the SSJ-UD annotations have been changed in release 2.5 (Nov 2019), so the ssj500k should be updated accordingly. @TomazErjavec, can we decide on the optimal timeline for this, so it does not get forgotten?

(Ofc, https://github.com/UniversalDependencies/UD_Slovenian-SSJ/issues/1 needs to be resolved beforehand.)

TomazErjavec commented 3 years ago

Can we first resolve what actualy needs to be done for (I assume) ssj550k 2.3?

I did a diff between the last commit on UD_Slovenian-SSJ and on running the scripts here, and there are some differences, but not many. Below the diff between sl_ssj-ud-dev.conllu here and the one on UD_Slovenian-SSJ:

1178c1178
< 39    ,       ,       PUNCT   Z       _       35      punct   _       Dep=0|Rel=Root
---
> 39    ,       ,       PUNCT   Z       _       42      punct   _       Dep=0|Rel=Root
1783c1783
< 21    nadzoru nadzor  NOUN    Ncmsl   Case=Loc|Gender=Masc|Number=Sing        18      conj    _       Dep=18\
|Rel=Coord
---
> 21    nadzoru nadzor  NOUN    Ncmsl   Case=Loc|Gender=Masc|Number=Sing        18      nmod    _       Dep=18\
|Rel=Coord
1785c1785
< 23    ,       ,       PUNCT   Z       _       2       punct   _       Dep=0|Rel=Root
---
> 23    ,       ,       PUNCT   Z       _       18      punct   _       Dep=0|Rel=Root
4367c4367
< 29    .       .       PUNCT   Z       _       16      punct   _       SpaceAfter=No|Dep=0|Rel=Root
---
> 29    .       .       PUNCT   Z       _       16      punct   _       Dep=0|Rel=Root
5019c5019
< 7     ,       ,       PUNCT   Z       _       30      punct   _       Dep=0|Rel=Root
---
> 7     ,       ,       PUNCT   Z       _       15      punct   _       Dep=0|Rel=Root
5946c5946
< 26    še      še      PART    Q       _       11      discourse       _       SpaceAfter=No|Dep=0|Rel=Root
---
> 26    še      še      PART    Q       _       28      advmod  _       SpaceAfter=No|Dep=0|Rel=Root
6212c6212
< 7     goloto  golota  NOUN    Ncfsa   Case=Acc|Gender=Fem|Number=Sing 6       obj     _       SpaceAfter=No|\
Dep=6|Rel=Obj
---
> 7     goloto  golota  NOUN    Ncfsa   Case=Acc|Gender=Fem|Number=Sing 6       obj     _       Dep=6|Rel=Obj
6625c6625
< 21    celo    celo    PART    Q       _       28      advmod  _       Dep=0|Rel=Root
---
> 21    celo    celo    PART    Q       _       27      advmod  _       Dep=0|Rel=Root
7427c7427
< 27    primerni        primeren        ADJ     Agpmpn  Case=Nom|Degree=Pos|Gender=Masc|Number=Plur     19    \
  conj    _       Dep=26|Rel=Atr
---
> 27    primerni        primeren        ADJ     Agpmpn  Case=Nom|Degree=Pos|Gender=Masc|Number=Plur     18    \
  conj    _       Dep=26|Rel=Atr
8065c8065
< 15    .       .       PUNCT   Z       _       4       punct   _       SpaceAfter=No|Dep=0|Rel=Root
---
> 15    .       .       PUNCT   Z       _       4       punct   _       Dep=0|Rel=Root
10239c10239
< 9     ,       ,       PUNCT   Z       _       6       punct   _       Dep=0|Rel=Root
---
> 9     ,       ,       PUNCT   Z       _       15      punct   _       Dep=0|Rel=Root
11024c11024
< 10    ,       ,       PUNCT   Z       _       41      punct   _       Dep=0|Rel=Root
---
> 10    ,       ,       PUNCT   Z       _       17      punct   _       Dep=0|Rel=Root
11537c11537
< 25    ,       ,       PUNCT   Z       _       22      punct   _       Dep=0|Rel=Root
---
> 25    ,       ,       PUNCT   Z       _       27      punct   _       Dep=0|Rel=Root
12366c12366
< 19    razrašča        razraščati      VERB    Vmpr3s  Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|Ve\
rbForm=Fin        7       parataxis       _       SpaceAfter=No|Dep=0|Rel=Root
---
> 19    razrašča        razraščati      VERB    Vmpr3s  Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|Ve\
rbForm=Fin        21      advcl   _       SpaceAfter=No|Dep=0|Rel=Root
12804c12804
< 21    reče    reči    VERB    Vmer3s  Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin     \
  4       parataxis       _       SpaceAfter=No|Dep=0|Rel=Root
---
> 21    reče    reči    VERB    Vmer3s  Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin     \
  14      advcl   _       SpaceAfter=No|Dep=0|Rel=Root
12991c12991
< 3     !       !       PUNCT   Z       _       6       punct   _       SpaceAfter=No|Dep=0|Rel=Root
---
> 3     !       !       PUNCT   Z       _       2       punct   _       SpaceAfter=No|Dep=0|Rel=Root
12999c12999
< 11    Meta    Meta    PROPN   Npfsn   Case=Nom|Gender=Fem|Number=Sing 6       parataxis       _       Dep=10\
|Rel=Atr
---
> 11    Meta    Meta    PROPN   Npfsn   Case=Nom|Gender=Fem|Number=Sing 6       advcl   _       Dep=10|Rel=Atr
13006c13006
< 18    opotekla        opoteči VERB    Vmep-sf Aspect=Perf|Gender=Fem|Number=Sing|VerbForm=Part        11    \
  conj    _       Dep=0|Rel=Root
---
> 18    opotekla        opoteči VERB    Vmep-sf Aspect=Perf|Gender=Fem|Number=Sing|VerbForm=Part        6     \
  conj    _       Dep=0|Rel=Root
14730c14730
< 14    !       !       PUNCT   Z       _       20      punct   _       Dep=0|Rel=Root
---
> 14    !       !       PUNCT   Z       _       10      punct   _       Dep=0|Rel=Root
14902c14902
< 8     -       -       PUNCT   Z       _       17      punct   _       Dep=0|Rel=Root
---
> 8     -       -       PUNCT   Z       _       10      punct   _       Dep=0|Rel=Root
14906c14906
< 12    športno športno ADV     Rgp     Degree=Pos      10      advmod  _       SpaceAfter=No|Dep=10|Rel=AdvM
---
> 12    športno športno ADV     Rgp     Degree=Pos      10      conj    _       SpaceAfter=No|Dep=10|Rel=AdvM
15248c15248
< 18    ,       ,       PUNCT   Z       _       13      punct   _       Dep=0|Rel=Root
---
> 18    ,       ,       PUNCT   Z       _       23      punct   _       Dep=0|Rel=Root
kajad commented 3 years ago

We agreed today that the official SSJ-UD release (2.8) will be mapped to the new ssj500k version (2.3) in the next few months by @TomazErjavec.

TomazErjavec commented 3 years ago

OK, this is mostly done, except for the errors in SRL and possilbly MWE (for that, cf. Redmine). @simonkrek, can you pls. start the next version of ssj500k (so, http://hdl.handle.net/11356/1210) and share with me the token.

simonkrek commented 3 years ago

Token shared.

TomazErjavec commented 3 years ago

Token received, tnx. Given that the UD annotations have been added to the corpus (even though it has not been published yet) I think we can close this issue.