CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Implement low-level command-line tools for working with W3C Web Annotations #81

Open proycon opened 2 years ago

proycon commented 2 years ago

This is still a bit tentative, but as we move towards web annotations, I'd like to see a toolset of fairly simple but high-performant command-line tools (and underlying programming library) to work with web annotations and perform certain common operations. I don't know if there's already some ongoing work on this and we certainly needn't reinvent the wheel if there is, but the focus here would be on simple standalone tools with a command-line interface, not requiring further infrastructure. Such tools could serve as a foundation for further microservices and encapsulate the main algorithmic components.

But tools I'm thinking of relate to:

I envision such tools to be written in a compiled language (C, C++, Go or Rust) and optimised for speed and low-memory consumption. The tools may also focus on certain subsets of W3C Web Annotations rather than attempt to encompass everything.

proycon commented 2 years ago

One of the tools I'd envision in such a toolset is an evaluation tool that can use one set of web annotations as system output, the other as the reference gold standard, and computes metrics (precision, recall, class confusion matrix). Such a tool can also be used to compute inter-annotator agreement. (poking @hayco as he might be interested in this)

@brambg This is something we may need for the analiticcl evaluation pipeline in the golden agents project anyway, if we approach it as generically as possible we may kill two birds in one stone..

proycon commented 1 year ago

This will probably be handled via the STAM tooling now.