Add SSJ-UD-2.5+ annotations to the official ssj500k #13

Closed

kajad commented 3 years ago

ssj500k v2.2 ( currently includes SSJ-UD data v2.4. As notified in an email on 19/11/2019, the SSJ-UD annotations have been changed in release 2.5 (Nov 2019), so the ssj500k should be updated accordingly. @TomazErjavec, can we decide on the optimal timeline for this, so it does not get forgotten?

(Ofc, needs to be resolved beforehand.)

TomazErjavec commented 3 years ago

Can we first resolve what actualy needs to be done for (I assume) ssj550k 2.3?

I did a diff between the last commit on UD_Slovenian-SSJ and on running the scripts here, and there are some differences, but not many. Below the diff between sl_ssj-ud-dev.conllu here and the one on UD_Slovenian-SSJ:

< 39    ,       ,       PUNCT   Z       _       35      punct   _       Dep=0|Rel=Root
> 39    ,       ,       PUNCT   Z       _       42      punct   _       Dep=0|Rel=Root
< 21    nadzoru nadzor  NOUN    Ncmsl   Case=Loc|Gender=Masc|Number=Sing        18      conj    _       Dep=18\
> 21    nadzoru nadzor  NOUN    Ncmsl   Case=Loc|Gender=Masc|Number=Sing        18      nmod    _       Dep=18\
< 23    ,       ,       PUNCT   Z       _       2       punct   _       Dep=0|Rel=Root
> 23    ,       ,       PUNCT   Z       _       18      punct   _       Dep=0|Rel=Root
< 29    .       .       PUNCT   Z       _       16      punct   _       SpaceAfter=No|Dep=0|Rel=Root
> 29    .       .       PUNCT   Z       _       16      punct   _       Dep=0|Rel=Root
< 7     ,       ,       PUNCT   Z       _       30      punct   _       Dep=0|Rel=Root
> 7     ,       ,       PUNCT   Z       _       15      punct   _       Dep=0|Rel=Root
< 26    še      še      PART    Q       _       11      discourse       _       SpaceAfter=No|Dep=0|Rel=Root
> 26    še      še      PART    Q       _       28      advmod  _       SpaceAfter=No|Dep=0|Rel=Root
< 7     goloto  golota  NOUN    Ncfsa   Case=Acc|Gender=Fem|Number=Sing 6       obj     _       SpaceAfter=No|\
> 7     goloto  golota  NOUN    Ncfsa   Case=Acc|Gender=Fem|Number=Sing 6       obj     _       Dep=6|Rel=Obj
< 21    celo    celo    PART    Q       _       28      advmod  _       Dep=0|Rel=Root
> 21    celo    celo    PART    Q       _       27      advmod  _       Dep=0|Rel=Root
< 27    primerni        primeren        ADJ     Agpmpn  Case=Nom|Degree=Pos|Gender=Masc|Number=Plur     19    \
  conj    _       Dep=26|Rel=Atr
> 27    primerni        primeren        ADJ     Agpmpn  Case=Nom|Degree=Pos|Gender=Masc|Number=Plur     18    \
  conj    _       Dep=26|Rel=Atr
< 15    .       .       PUNCT   Z       _       4       punct   _       SpaceAfter=No|Dep=0|Rel=Root
> 15    .       .       PUNCT   Z       _       4       punct   _       Dep=0|Rel=Root
< 9     ,       ,       PUNCT   Z       _       6       punct   _       Dep=0|Rel=Root
> 9     ,       ,       PUNCT   Z       _       15      punct   _       Dep=0|Rel=Root
< 10    ,       ,       PUNCT   Z       _       41      punct   _       Dep=0|Rel=Root
> 10    ,       ,       PUNCT   Z       _       17      punct   _       Dep=0|Rel=Root
< 25    ,       ,       PUNCT   Z       _       22      punct   _       Dep=0|Rel=Root
> 25    ,       ,       PUNCT   Z       _       27      punct   _       Dep=0|Rel=Root
< 19    razrašča        razraščati      VERB    Vmpr3s  Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|Ve\
rbForm=Fin        7       parataxis       _       SpaceAfter=No|Dep=0|Rel=Root
> 19    razrašča        razraščati      VERB    Vmpr3s  Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|Ve\
rbForm=Fin        21      advcl   _       SpaceAfter=No|Dep=0|Rel=Root
< 21    reče    reči    VERB    Vmer3s  Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin     \
  4       parataxis       _       SpaceAfter=No|Dep=0|Rel=Root
> 21    reče    reči    VERB    Vmer3s  Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin     \
  14      advcl   _       SpaceAfter=No|Dep=0|Rel=Root
< 3     !       !       PUNCT   Z       _       6       punct   _       SpaceAfter=No|Dep=0|Rel=Root
> 3     !       !       PUNCT   Z       _       2       punct   _       SpaceAfter=No|Dep=0|Rel=Root
< 11    Meta    Meta    PROPN   Npfsn   Case=Nom|Gender=Fem|Number=Sing 6       parataxis       _       Dep=10\
> 11    Meta    Meta    PROPN   Npfsn   Case=Nom|Gender=Fem|Number=Sing 6       advcl   _       Dep=10|Rel=Atr
< 18    opotekla        opoteči VERB    Vmep-sf Aspect=Perf|Gender=Fem|Number=Sing|VerbForm=Part        11    \
  conj    _       Dep=0|Rel=Root
> 18    opotekla        opoteči VERB    Vmep-sf Aspect=Perf|Gender=Fem|Number=Sing|VerbForm=Part        6     \
  conj    _       Dep=0|Rel=Root
< 14    !       !       PUNCT   Z       _       20      punct   _       Dep=0|Rel=Root
> 14    !       !       PUNCT   Z       _       10      punct   _       Dep=0|Rel=Root
< 8     -       -       PUNCT   Z       _       17      punct   _       Dep=0|Rel=Root
> 8     -       -       PUNCT   Z       _       10      punct   _       Dep=0|Rel=Root
< 12    športno športno ADV     Rgp     Degree=Pos      10      advmod  _       SpaceAfter=No|Dep=10|Rel=AdvM
> 12    športno športno ADV     Rgp     Degree=Pos      10      conj    _       SpaceAfter=No|Dep=10|Rel=AdvM
< 18    ,       ,       PUNCT   Z       _       13      punct   _       Dep=0|Rel=Root
> 18    ,       ,       PUNCT   Z       _       23      punct   _       Dep=0|Rel=Root
kajad commented 3 years ago

We agreed today that the official SSJ-UD release (2.8) will be mapped to the new ssj500k version (2.3) in the next few months by @TomazErjavec.

TomazErjavec commented 3 years ago

OK, this is mostly done, except for the errors in SRL and possilbly MWE (for that, cf. Redmine). @simonkrek, can you pls. start the next version of ssj500k (so, and share with me the token.

simonkrek commented 3 years ago

Token shared.

TomazErjavec commented 3 years ago

Token received, tnx. Given that the UD annotations have been added to the corpus (even though it has not been published yet) I think we can close this issue.