Currently convert_dependencies.py performs three things:
fixes UD morphological category or features
determines the UD dependency and label
splits output into train/dev/test
I suggest it would be better (mode modular, easier to maintain, and re-purpuse) if the script were split into three and then run in a pipeline. In particular:
this should move (if it will be still necessary with the new schema and rules) to jos2ud.pl
this is the core script
and splitting, necessary only for official UD data
Currently
convert_dependencies.py
performs three things:I suggest it would be better (mode modular, easier to maintain, and re-purpuse) if the script were split into three and then run in a pipeline. In particular:
jos2ud.pl