clarinsi / jos2ud

1 stars 0 forks source link

Decompose convert_dependencies.py #1

Closed TomazErjavec closed 5 years ago

TomazErjavec commented 5 years ago

Currently convert_dependencies.py performs three things:

  1. fixes UD morphological category or features
  2. determines the UD dependency and label
  3. splits output into train/dev/test

I suggest it would be better (mode modular, easier to maintain, and re-purpuse) if the script were split into three and then run in a pipeline. In particular:

  1. this should move (if it will be still necessary with the new schema and rules) to jos2ud.pl
  2. this is the core script
  3. and splitting, necessary only for official UD data
TomazErjavec commented 5 years ago

This is now implemented, splits performed with ud-data-split.py script, seems to work fine! cf. f8d0ab5