I agree that the image ROI selections should be preserved as standoff annotations, but when we do that we should preserve the context of the annotation, whatever that happens to be, and, as we've discussed before, we should be always have the option to serialize (and export) the annotations according to the OA data model, in JSON-LD (with RDF/XML as an option as well). I think it's fine to put the annotation directly into a fuseki triple store as we do it (although it may be interesting to think about preserving it separately too I don't know for sure), but we should be using a standard ontology when doing so, and capturing all of the necessary information to be able to export it a well-formed OA annotation. OA is a core requirement - it's important for interoperability and I think it's also a really helpful guide when modeling what needs to be preserved.
The embedding of the urns in the TEI was for convenience of the editor and it was a way to postpone assignment of the target for the annotation until the text itself is transcribed, since, in at least one form of the annotation, we want to use a CTS URN of the text as the target of the annotation, and we can't calculate that as we are transcribing from scratch. I have been intending that upon finalization of a publication, the embedded CITE urn references of the Image ROIs would be extracted from the TEI, the CTS URNs of the word(s) calculated, and new stand-off annotations created, in which the target of the annotation is the CTS URN of the word(s) associated with the CITE Urn of the image ROI.
I think we can also store separate annotations at transcription time, which are essentially the reverse, where the target of the annotation is the image ROI and the body of the annotation is the text that was transcribed, but if we do so we should also include additional context about the text that was being annotated, by whom, etc. I think modeling both of these types of annotations using OA is essential to figuring out what we should preserve here.
From Bridget email...
I agree that the image ROI selections should be preserved as standoff annotations, but when we do that we should preserve the context of the annotation, whatever that happens to be, and, as we've discussed before, we should be always have the option to serialize (and export) the annotations according to the OA data model, in JSON-LD (with RDF/XML as an option as well). I think it's fine to put the annotation directly into a fuseki triple store as we do it (although it may be interesting to think about preserving it separately too I don't know for sure), but we should be using a standard ontology when doing so, and capturing all of the necessary information to be able to export it a well-formed OA annotation. OA is a core requirement - it's important for interoperability and I think it's also a really helpful guide when modeling what needs to be preserved.
The embedding of the urns in the TEI was for convenience of the editor and it was a way to postpone assignment of the target for the annotation until the text itself is transcribed, since, in at least one form of the annotation, we want to use a CTS URN of the text as the target of the annotation, and we can't calculate that as we are transcribing from scratch. I have been intending that upon finalization of a publication, the embedded CITE urn references of the Image ROIs would be extracted from the TEI, the CTS URNs of the word(s) calculated, and new stand-off annotations created, in which the target of the annotation is the CTS URN of the word(s) associated with the CITE Urn of the image ROI.
I think we can also store separate annotations at transcription time, which are essentially the reverse, where the target of the annotation is the image ROI and the body of the annotation is the text that was transcribed, but if we do so we should also include additional context about the text that was being annotated, by whom, etc. I think modeling both of these types of annotations using OA is essential to figuring out what we should preserve here.