Implement low-level command-line tools for working with W3C Web Annotations

proycon commented 2 years ago

This is still a bit tentative, but as we move towards web annotations, I'd like to see a toolset of fairly simple but high-performant command-line tools (and underlying programming library) to work with web annotations and perform certain common operations. I don't know if there's already some ongoing work on this and we certainly needn't reinvent the wheel if there is, but the focus here would be on simple standalone tools with a command-line interface, not requiring further infrastructure. Such tools could serve as a foundation for further microservices and encapsulate the main algorithmic components.

But tools I'm thinking of relate to:

validation tools (e.g. basic things like validating TextPosition offsets)
conversions between certain annotation paradigms/vocabularies we use within CLARIAH
conversions with other representations (like the untangle line, e.g. TEI XML, FoLiA).
possibly simple query tools (without needed a whole elucidate server).

I envision such tools to be written in a compiled language (C, C++, Go or Rust) and optimised for speed and low-memory consumption. The tools may also focus on certain subsets of W3C Web Annotations rather than attempt to encompass everything.

proycon commented 2 years ago

One of the tools I'd envision in such a toolset is an evaluation tool that can use one set of web annotations as system output, the other as the reference gold standard, and computes metrics (precision, recall, class confusion matrix). Such a tool can also be used to compute inter-annotator agreement. (poking @hayco as he might be interested in this)

@brambg This is something we may need for the analiticcl evaluation pipeline in the golden agents project anyway, if we approach it as generically as possible we may kill two birds in one stone..

proycon commented 1 year ago

This will probably be handled via the STAM tooling now.

CLARIAH / clariah-plus

Implement low-level command-line tools for working with W3C Web Annotations #81