As part of our gold corpus annotation pipeline, we need a converter from the Bratt standoff annotation format to CDLI-CoNLL format.
Other links or relevant information
The converter should first fetch the CDLI-CoNLL data from the database for the text being converted and reuse the ID, FORM, SEGM and XPOSTAG columns. (Since the db isn't set-up yet with the CDLI-CoNLL field, this will have to be done from a file for now.)
It should then convert the syntax and semantic annotations from the Brat standoff format to the CoNLL-U format for the columns HEAD, DEPREL (not using DEPS) and MISC.
Semantics will be added in following custom columns which we have to define as part of #30 so at this time it is partly blocking this task.
For information about the CoNLL-U Syntax field format ( which will be exactly the same in CDLI-CoNLL except for DEPS which we will not be using) see
http://universaldependencies.org/format.html at "Syntactic Annotation"
We have a working converter from CoNLL-U to Brat which can also reverse the process although we haven't tested that yet. the code could be reused? See here:
https://github.com/cdli-gh/conllu.py
For example of CDLI-CoNLL files as they will appear in the database, see in the MTAAC dive MTAAC > Annotations > Annotation Test (morph) > Fully Annotated Files
Summary
As part of our gold corpus annotation pipeline, we need a converter from the Bratt standoff annotation format to CDLI-CoNLL format.
Other links or relevant information
The converter should first fetch the CDLI-CoNLL data from the database for the text being converted and reuse the ID, FORM, SEGM and XPOSTAG columns. (Since the db isn't set-up yet with the CDLI-CoNLL field, this will have to be done from a file for now.)
It should then convert the syntax and semantic annotations from the Brat standoff format to the CoNLL-U format for the columns HEAD, DEPREL (not using DEPS) and MISC.
Semantics will be added in following custom columns which we have to define as part of #30 so at this time it is partly blocking this task.
For information about the CoNLL-U Syntax field format ( which will be exactly the same in CDLI-CoNLL except for DEPS which we will not be using) see http://universaldependencies.org/format.html at "Syntactic Annotation"
For information about the Brat format see http://brat.nlplab.org/standoff.html
We have a working converter from CoNLL-U to Brat which can also reverse the process although we haven't tested that yet. the code could be reused? See here: https://github.com/cdli-gh/conllu.py
For example of CDLI-CoNLL files as they will appear in the database, see in the MTAAC dive MTAAC > Annotations > Annotation Test (morph) > Fully Annotated Files
Roadmap Data
π Start Date: 2017-11-28
π Expected Date: 2018-01-03
πͺ Label: wp
π Progress (0-1): 0.1
See Gantt: http://cdli-dev.org/gantt/mtaac_work/