TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

standOff should be allowed to contain xenoData #2436

Closed jamescummings closed 4 months ago

jamescummings commented 1 year ago

I think that <xenoData> should be allowed to appear inside <standOff>. My reasoning for this is two-fold:

  1. <standOff> is "a container element for linked data, contextual information, and stand-off annotations" and many LOD projects are using <xenoData> for some of their processing workflows and don't create proper TEI structures for this. However, they sometimes mix and match and have some annotation in <xenoData> and some in <standOff>. So it would be useful for processing to be able to extract just the <standOff> containing both of those in a single container element. This is an appeal to convenience.

  2. Some of those using <xenoData> are storing non-TEI standoff data, which is pointing locally into the document, e.g. with web annotations data model. While in an ideal world they would store this using <annotation> inside <standOff>, it is unlikely they'll change their processing. So in storing this information, it would be good if it could appear in <standOff> but it will remain <xenoData>, as that is where information like this belongs in the TEI abstract model. This is an appeal to semantics.

Proposal:

bansp commented 1 year ago

This means opening the TEI to lots of potential weirdness, but, as many may recall, I'm not in favour of attempts to prevent human silliness by preemptively restricting the code base.

It feels more important that, among the potential silliness, some genuinely useful cases may happen that will result in widening the TEI's coverage. Those who complain about too much baroque are going to become less convincing, all of a sudden, when the baroque is there basically for the sake of the TEI header, while standOff stores whatever the given project needs.

jamescummings commented 10 months ago

I'm not as worried as Piotr over any potential weirdness here. All it means is that projects that use TEI, but really store their annotations in a different format, will actually use the TEI rather than abandoning it completely for other formats. Yes, I'd rather that they use <annotation> properly, but there are many projects which aren't going to do so. This is real standOff type of data, but just because of processing workflows projects won't always put it in the correct TEI workflow. It seems a simple change to enable greater usage of the TEI without really causing problems for people who fully use the TEI. I don't really see many potential negative side-effects.

bansp commented 10 months ago

Oh but I did try to stress that I'm not worried. :-) Essentially, we fully converge on the potential usefulness of this.

lb42 commented 10 months ago

Well it rather depends what you mean by "use the tei" doesnt it?

jamescummings commented 10 months ago

@lb42 -- if they are 'using the TEI' for their document encoding, their marking of named entities, etc. but the linking of those to LOD entities is stored in RDF rather than using then I'd still say they are 'using the TEI'... just not using it for one small aspect. ;-)

In general I think this change is unproblematic, will encourage those using the TEI in this manner to feel part of the TEI community, and just makes sense. It has no real side-effects for existing TEI users who aren't interested in doing this.

lb42 commented 10 months ago

Lots of ifs in there @jamescummings ! My concern is the risk to the tei brand when it is used in this superficial way. For example when i did my search for uses of xenodata in the wild i was quite depressed to find projects systematically using it for data which really belonged in a tei header, alongside a vacuous tei header. If that isnt breaking the the conceptual model, what is?

jamescummings commented 10 months ago

@lb42 Yes, but I don't think that is the use-case here, but people having either xenoData that is standOff in nature wanting to put it in the 'right place' inside the standOff element, or people who are using standOff properly, but simultaneously store a copy of that serialised into a format they store in xenoData (e.g. RDF JSON serialisation of properly done standOff web annotation data model done in TEI, but where they have the JSON version of it for easy of display/rendering rather than having to do that on the fly.)

jamescummings commented 6 months ago

Hi @sabineseifert Just checking if this issue has come up for council discussion? It seems fairly straightforward to me and will encourage people who do have standOff-like xenoData to put it in a more appropriate place.

sabineseifert commented 5 months ago

Not yet but I will try and put it further up the list for next meeting!

sydb commented 5 months ago
  1. I think @lb42’s concerns are valid — people could abuse it.
  2. I think the use case is valid — some folks are going to use other data formats with their TEI, I would just as soon they embed those formats in their TEI.
  3. I think (2) outweighs (1), and we should allow <xenoData> inside <standOff>. I am not sure if there should be any health warnings about its use or not.
jamescummings commented 5 months ago

I agree @sydb that there will be people who abuse it in some way that I've not considered... But that is true of almost every change we make. I certainly know of a project which would benefit from this by wanting to embed their standoff xenodata in the 'right' place inside standoff. If that encourages other projects, then great. If some abuse it somehow, at least they are embedding their data in their TEI file rather than storing separately, which I think is a win.

sabineseifert commented 4 months ago

Full Council discussion on April 13 at VF2F: