clarinsi / jos2ud

1 stars 0 forks source link

UDv2.4 in relation to UDv2.3 #8

Closed kajad closed 5 years ago

kajad commented 5 years ago

I have now updated the convert_dependencies.py script in 1c3b305, so as to conform to the new morphological input. The resulting treebank (UDv2.4) is identical to UDv2.3, except for:

I have also re-introduced the encoding=utf8 declarations for writing and reading, as this inhibited testing in Windows command line. The script also works on linux (tantra). If this is still problematic on @TomazErjavec 's side, we need to find a universally acceptable solution.

TomazErjavec commented 5 years ago

some sentence-final tokens now lose the SpaceAfter=No info in MISC, e.g. ssj2.2.11 (@TomazErjavec, is this expected?)

Yes, this is the change introduced in 65d7d34. (Arguably) no sentence should end with SpaceAfter=No.

If this is still problematic on @TomazErjavec 's side,

Not problematic, both scripts work as expected.