TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
275 stars 88 forks source link

Encoding of Standoff annotations #374

Closed TEITechnicalCouncil closed 4 years ago

TEITechnicalCouncil commented 12 years ago

The annotation of documents using standoff annotations is a very useful and flexible methodology. Nevertheless, TEI does not have any specific elements for encoding this information. In most of cases, the standoff annotations are stored as external TEI files linked to the text being annotated. Nevertheless, this way of storing the standoff annotations is very rigid and presents numerous problems, for example, for indexing or searching the corpus of documents using the information of the annotations. In these cases, it would be very useful to have the standoff annotations INSIDE the TEI documents being annotated (!!!).

Therefore, it is suggested to include define a new set of TEI elements specifically dedicated to the encoding of the standoff annotations.

The idea would be to store the standoff annotations between the <teiHeader> and the <text>, following the same philosophy as used for the <facsimile> and for <sourceDoc> (in some way these two elements could also be considered as a "type" of annotation).

For the standoff annotation, the structure could be:

<TEI> <teiHeader> ... </texHeader> <standoff> [information of the annotations] </standoff> <text> ... </text> </TEI>

This structure would provide the extra advantage of allowing to annotate the information at different TEI levels in a natural manner. So for more complicated TEI documents having different hierarchical levels, the standoff annotations could be encoded as follows:

<teiCorpus> <teiHeader> ... </teiHeader> <TEI> <teiHeader> ... </texHeader> <standoff> ... </standoff> <text> ... </text> </TEI> <TEI> <teiHeader> ... </texHeader> <standoff> ... </standoff> <text> ... </text> </TEI> </teiCorpus>

This structure would also provide the extra advantage of allowing to annotate, not only the text of the document, but also the metadata of the different hierarchical levels of the TEI document.

The specific encoding of the annotations inside <standoff> could be as follows:

<standoff> <annotation type="..." subtype="..."> <author>...</author> <date>...</date> <ptr>...</ptr> [other data needed] </annotation> </standoff>

As a last remark it is also suggested to allow inside the <annotation> the TEI element <figure> in order to facilitate the annotation not only of textual information, but also of images and formulas.

Conclusion: the proposed structure for the encoding of standoff annotations in TEI provides the following advantages:

- allows to encode standoff annotations under TEI in a natural manner, which is not the case at the moment

Remark: this idea has been already suggested by Piotr Bański in his article "Why TEI stand-off annotation doesn't quite work and why you might want to use it nevertheless", in http://www.balisage.net/Proceedings/vol5/html/Banski01/BalisageVol5-Banski01.html

Original comment by: sf_user_posejavier

laurentromary commented 8 years ago

Indeed: annotationBlock must be the elementary unit of representation in standOff annotations.

ebeshero commented 6 years ago

F2F (Victoria, 2017) Council agrees that @peterstadler and @laurentromary should go ahead with working on this.

ebeshero commented 6 years ago

F2F (Victoria 2017): we need to move forward with implementing LinkDataBlock.

peterstadler commented 6 years ago

@laurentromary and I just had a conf call discussing this issue and the further roadmap. He made a strong point about not merging the standoff proposal with a 'linkDataBlock' proposal, because the first is about annotating some text (thus pointing into the text) whereas the second is about adding editorial content to some text (and to which is pointed from the text). There has been some confusion about these distinctions (including myself), so I hope Laurent will elaborate on this!

Next steps:

  1. Laurent will prepare a short paper based on his talk during the TEI conference 2016 in Vienna (see https://hal.inria.fr/hal-01374102)
  2. I will try to arrange for a dedicated Council conf call with Laurent in January 2018.
peterstadler commented 6 years ago

Just stumbled across https://github.com/one-step-beyond/tei-standoff (but didn't take a closer look) Anyone familiar with this?

lb42 commented 6 years ago

Apparently a Turska-Spadini production. Don't you TEI councillors talk to each other any more?

laurentromary commented 6 years ago

Maybe we should start discussing the creation of elements. We have 2 to create: (name agree upon by council at Ann Arbor) and and one to adapt . The council has had implementation proposal after we compiled the scenario document on the Ann Arbour branch.

laurentromary commented 6 years ago

I am also in Tokyo from Tuesday to Friday of the TEI conference. If a group from the council is ready to make a move towards implementation, we could simply have an operational session there.

tuurma commented 6 years ago

@peterstadler an attempt I was considering with Elena Spadini back in the day, virtually dead now, though some of that thinking was incorporated in an approach we used later for earlyPrint (namely using existing TEI elements as the body of the annotation)

tuurma commented 6 years ago

@laurentromary would be great to have a session in Tokyo! So far we have agreed to get our act together in a small group to get back to Council

peterstadler commented 4 years ago

A wrapper <standOff> element has been created in 565fc72caec15d3569e496b5e71fb05ca772b158 and will be available in the upcoming release 3.7.0.

There is probably more to be fleshed out concerning the content of <standOff> but this qualifies dedicated tickets, so closing this one here.