comphist / cora

A web-based, token-level annotation tool for non-standard language data
http://www.linguistics.rub.de/comphist/resources/cora/
MIT License
10 stars 6 forks source link

Document CorA XML format #14

Closed mbollmann closed 7 years ago

mbollmann commented 9 years ago

Originally reported by: Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann)



mbollmann commented 7 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


We now have:

This can certainly still be improved, but should be enough to close this issue for now.

mbollmann commented 7 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Add script for XML validation (re #11)

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Schematron file to check various content properties (ID reference validity etc.) is still in the works.

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Replace outdated DTD by RelaxNG schema (re issue #11)

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Aiko Freyth currently working on a RELAX NG schema

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Ways to validate RELAX NG + Schematron:

There doesn't seem to be a simple stand-alone CLI tool, though it would seem to be trivial to make with lxml.

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


After some research, I would probably try to go with RELAX NG for the main schema language.

I'm not sure if it's capable to express our ID reference ranges ("t3_d1..t7_d2"), but it could probably be combined with Schematron for this purpose. At least Wikipedia states that combining these two systems is a typical solution to get the strengths of both systems.

mbollmann commented 8 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Commit 7917f2c adds an informal description of the format to the user documentation.