CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

[IG Annotation] Defining the scope of the Annotation group #20

Closed proycon closed 2 years ago

proycon commented 4 years ago

@marijnkoolen already listed the following in the initial README document:

  • support of the creation and use of annotations on any media type
  • support of manual annotation processes
  • support of manual correction of automatic annotation ...

Annotation aspects that are outside the scope of this Interest Group (because they are covered by other IGs):

  • automatic annotation processes
  • crowdsourcing

I have the impression this is a solid start we're all in agreement with?

@roelandordelman formulated it nicely in a mail prior to the creation of this group (dutch):

Mijn two cents in dezen is dat het mijn sterke voorkeur zou hebben om de automatische annotatie/verrijkingsprocessen (eenduidige definities CLARIAH breed is overigens ook een goeie) als aparte (buiten)categorie te behandelen en expliciet te focussen op ‘manual annotation’, ofwel annotatie vanaf het moment dat er een ‘human in de loop’ is waarbij je dan ook eventueel het grijze gebied van handmatige correctie van automatische annotaties of zaken als supervision in auto processen kunt meepakken. Dus de stap van een automatisch gegenereerd transcript naar een generiek annotatie formaat in een human readable omgeving zou wat mij betreft ook bij annotatie horen. Het interfacen met automatische annotatieprocessen ook. Maar dus niet de internals van iets als automatische spraakherkenning of sentiment analyse. Ook crowdsourcing —toch weer een andere tak van sport— zou ik in eerste instantie buiten scope plaatsen en zien als een blackbox die wel weer materiaal kan aanleveren aan het annotatieproces.

I'd say those automatic annotation processes should indeed be kept out of our scope, and are in the scope of the Text Interest Group and the AudioVisual Interest Group.

Do you all agree it is definitely within our scope to provide an overview of and eventually recommendations for annotation models/paradigms/formats, because that was one of the first things I was thinking about. I'm thinking we could first of all establish an inventory of annotation models/paradigms/formats (for text, audio, video, or whatever) that are in use in CLARIAH, their users, and an overview of what interoperability tools are already available (e.g. converters). These would then also be out-of-scope for the Text/AV IG group, so we have clear boundaries.

There's also the planned interest group on Linked Open Data which may have overlap with this group, after all, annotations may take the form of linked open data (like web annotation) or may be expressed in more intrinsic formats such as FoLiA, TEI, ELAN, etc.. Do you already have ideas on how these two groups should relate?

marijnkoolen commented 4 years ago

I think that the scope of this group could include such overviews and recommendations, but a similar issue applies as to the annotation processes. I would limit it to those models/paradigms/formats that are relevant to manual annotation processes. Perhaps the best way to clarify the scope is to start listing and discussing them.

Within CLARIAH WP2 and WP5 the focus has been on Web Annotation (WA) and in WP5 specifically also on ELAN. There is also a group of tool builders (the VAINT initiative) working on an annotation exchange format based on WA (an application profile) for importing and exporting video annotations. We also have been working on a WA+RDFa annotation solution for Digital Scholarly Editions and digital historical source editions.

That makes it all the more important indeed to also discuss the relation between this Annotation IG and the one for LOD. Given our focus on WA, I'd include in our scope at least as format for exchange, but I'd like to hear other people's suggestions on how to draw a line between the two groups.

marijnkoolen commented 4 years ago

I've added a new doc for this inventory for now. At some point we need to discuss how to organize documents in this repo, but while we're still discussing the scope I want to keep things simple. Please add or suggest something better.

proycon commented 4 years ago

I would limit it to those models/paradigms/formats that are relevant to manual annotation processes. Perhaps the best way to clarify the scope is to start listing and discussing them.

Agreed, though I think most models/paradigms/formats are not specific to either manual or automatic annotation.

I've added a new doc for this inventory for now. At some point we need to discuss how to organize documents in this repo, but while we're still discussing the scope I want to keep things simple. Please add or suggest something better.

Thanks, I have made a start expanding the inventory (#6). We have a section on text annotation converters, which I filled. In addition, I added an extra section "text convertors", meaning conversion between text formats (think of something like Word to Markdown), which I think is out of scope for us as it does not really concern annotation. What do you think? The new section merely exists to clarify the scope.

That makes it all the more important indeed to also discuss the relation between this Annotation IG and the one for LOD. Given our focus on WA, I'd include in our scope at least as format for exchange, but I'd like to hear other people's suggestions on how to draw a line between the two groups.

I don't have a clear view on this yet. I guess we also need to wait until that group starts.

Another question regarding scope arose when I was working on the inventory. Do we consider corpus search systems with an explicit focus on searching/indexing annotations as part of our scope? I'd say so. In the text domain, this would then include tools like Blacklab/Corpus-frontend, MTAS (Nederlab), Paqu, Gretel, ANNIS.

marijnkoolen commented 4 years ago

Thanks for the extensive list! I've added a few more tools and also some protocols that are used (WA, IIIF) or may be useful in future (DTS) within CLARIAH.

In addition, I added an extra section "text convertors", meaning conversion between text formats (think of something like Word to Markdown), which I think is out of scope for us as it does not really concern annotation. What do you think? The new section merely exists to clarify the scope.

I think this is useful, as its related and will be brought up by others if we don't explicitly mention is as related but out of scope.

That makes it all the more important indeed to also discuss the relation between this Annotation IG and the one for LOD. Given our focus on WA, I'd include in our scope at least as format for exchange, but I'd like to hear other people's suggestions on how to draw a line between the two groups.

I don't have a clear view on this yet. I guess we also need to wait until that group starts.

Yes, they've at least created a repo now (IG-LOD). I mentioned the possible overlap to @rlzijdeman. At some point we should discuss this in both groups.

Another question regarding scope arose when I was working on the inventory. Do we consider corpus search systems with an explicit focus on searching/indexing annotations as part of our scope? I'd say so. In the text domain, this would then include tools like Blacklab/Corpus-frontend, MTAS (Nederlab), Paqu, Gretel, ANNIS.

Yes, querying annotations is inside the scope, as it touches on user needs around the use of annotations. Again, I'd like to keep the focus on querying manual annotations, although here it's probably difficult to draw a meaningful boundary between manual and automatic.

proycon commented 2 years ago

Closing this, this IG group never took off