TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
272 stars 88 forks source link

Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

Open chiarcos opened 5 years ago

chiarcos commented 5 years ago

Motivation is to achieve a representation of RDF relations in the TEI which is unambiguous in vocabulary and semantics. Note that this does not pertain to cases where native TEI vocabulary elements could be interpreted as triples, but to cases that are not covered by TEI semantics, e.g., the linking between a passage in a edition and a terminology repository or a CTS urn. A similar restriction can be found in the definition of <link>.

At the moment, there are at least three different possibilities to express RDF triples inline in TEI: <relation> (#311) <fs> <link>

Each of these are problematic as they conflate pre-RDF and RDF semantics, and that they are analogy-driven ("tag abuse") rather than explicitly defined. The currently preferred solution with <relation> is restricted to named entities, example 4 in the guidelines thus breaks the TEI schema (see my comment on #311).

Several alternatives are possible (see email thread in http://tei-l.970651.n3.nabble.com/Best-practice-for-W3C-Web-Annotations-generated-based-on-TEI-names-and-dates-module-tags-td4031445.html). One possibility, RDFa, has great appeal due to being an established W3C standard that comes with off-the-shelf tooling (e.g., https://www.w3.org/2012/pyRdfa/ and http://www.sparql.org/sparql.html which can directly run against TEI documents or derived XML formats that maintain [rather than generate] RDFa information).

In the past, RDFa has been ruled out, partially because of fears it would evolve and this would have a negative impact on the TEI (http://tei-l.970651.n3.nabble.com/TEI-and-RDFa-was-Re-SAWS-and-LOD-was-Re-Cross-references-among-segs-in-TEI-td4025195.html). Since its W3C standardization (2015, https://www.w3.org/TR/rdfa-core/), this risk does no longer exist.

In 2018, two successful applications of TEI+RDFa in two independent projects have been reported (http://lrec-conf.org/workshops/lrec2018/W23/pdf/10_W23.pdf, http://e-spacio.uned.es/fez/eserv/bibliuned:363-Pruiz3/Ruiz_Fabo_Pablo_DISCO.pdf), thus motivating project-independent specifications, ideally as part of the TEI. I suggest to follow the modeling of https://github.com/postdataproject/disco/#rdfa-attributes.

Note1: This is a follow-up to #311, but a different approach.

Note2: One possible alternative is to redefine <link>, <relation> or (not and) <relation> to provide unambiguous RDF semantics and to couple this with GRDDL/XSLT scripts to generate RDFa attributes (cf. http://www.ancientwisdoms.ac.uk/media/ontology/tei_to_rdf.xsl).

Note3: Third possibility is to sandbox RDFa attributes by restricting them to <ab> and <seg> (i.e., same contexts as for <relation> in the SAWS proposal: http://www.ancientwisdoms.ac.uk/media/documents/Markup_Guidelines_for_Gnomologia.html#TEI.relation)

lb42 commented 5 years ago

Just for completeness, I ask again: what about <graph>? (especially, since I understand graph-theoretic ontologies are replacing RDF in some ecosystems)

chiarcos commented 5 years ago

RDF and graphs are closely related, indeed. On a theoretical level, RDF formalizes labelled directed multi-graphs. A technical difference is that RDF is based on URIs and W3C standards whereas graph databases are usually not. But <graph> in TEI is not meant to provide graphs as a data structure, but only visualizations of such data structures. At least this is what the examples under https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-graph.html and https://www.tei-c.org/release/doc/tei-p5-doc/en/html/GD.html look like. It's more like GraphViz/Dot than like RDF, and of course both could be used to draw RDF graphs as illustrations.

lb42 commented 5 years ago

Why do you think tei:graph is not intended to provide a way of encoding a graph data structure? The beginning of chapter 19 would seem to indicate that it is: "Among the types of qualitative relations often represented by graphs are organizational hierarchies, flow charts, genealogies, semantic networks, transition networks, grammatical relations, tournament schedules, seating plans, and directions to people's houses. In developing recommendations for the encoding of graphs of various types, we have relied on their formal mathematical definitions and on the most common conventions for representing them visually. However, it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement."

PietroLiuzzo commented 5 years ago

in Beta Masaheft we also transform, like SAWS, the TEI in RDF triples of different flavours. However, I now think that perhaps these semantic mapping could be defined in a project ODD rather than in the transformation, with something like models and @behaviour, and that an XSLT or any other script making that transform from TEI to RDF should be able to rely on that information in the ODD in order to do so. Because in most cases people will make different decisions on what classes and properties to use in their RDF also where their TEI is the same, even nicer would be if there were suggested associations for each element in the TEI modules already which could be customised in the ways all other parts of TEI are customisable, including them or not, adding or changing them. I could then define in my ODD the precise semantics, and opt for seg, relation, link, etc. and have it clearly defined in my custom ODD and in relation to the standard set in the original modules.

chiarcos commented 5 years ago

Wrt <graph>: In fact, my interpretation of "it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement" would indeed be that <graph> deals with graphical representations of graphs, with the "partially" clause referring to the fact that the rendering itself is beyond TEI (as it is beyond the dot language). We should probably elicit feedback on actual uses of <graph>, but it should definitely not be used for both purposes, because of their different functions: A conceptual graph is normally not to be rendered whereas graph visualizations have to.

chiarcos commented 4 years ago

We should probably elicit feedback on actual uses of <graph>, but it should definitely not be used for both purposes, because of their different functions: A conceptual graph is normally not to be rendered whereas graph visualizations have to.

Public responses under http://tei-l.970651.n3.nabble.com/Current-and-historical-uses-of-lt-graph-gt-td4031618.html. Neither there nor in the private responses, any actual and current use of <graph> has been confirmed, only its historical use for drawing network graphs and its potential use for representing graph data structures. If indeed, the use of <graph> as a data structure (rather than a graphical representation) would be endorsed by the TEI, I would strongly suggest to rephrase its definition accordingly, and to provide alternative vocabulary for the representation use (e.g., by recommending/enabling the embedding of SVG [or GraphML], following the spirit of the suggestion in https://wiki.tei-c.org/index.php/TEI_to_SVG#Using_SVG_with_TEI).

For pragmatic reasons, I would prefer an RDFa-compliant solution (even if possibly sandboxed by restricting it to container elements such as <seg> and <ab>) because it comes with off-the-shelf tooling whereas anything based on <graph> would have to be rebuilt by every data provider individually (and as a new XML-based solution, it is highly unlikely to find any support outside the DH community). More important than this (personal) preference is, however, to have clear instructions for expressing RDF triples (or at least, RDF properties and objects) in TEI and to have that in the guidelines, and with respect to this, I'd be happy with any clear guidance.

martindholmes commented 4 years ago

@chiarcos For a very straightforward solution, have you considered just putting RDFa inside a <xenoData> element and pointing to/from the TEI? That would leave your RDFa clean, straightforward and easily processable, while tightly linking it to the TEI content.

chiarcos commented 4 years ago

Am .09.2019, 10:55 Uhr, schrieb Martin Holmes notifications@github.com:

@chiarcos For a very straightforward solution, have you considered just
putting RDFa inside a element and pointing to/from the TEI?
That >would leave your RDFa clean, straightforward and easily
processable, while tightly linking it to the TEI content. Yes, but is a header element that can be used for RDF
metadata (and this is the first example in the guidelines), and I see no
easy way to use if for annotating content elements with RDF links.

Problems:

martindholmes commented 4 years ago

@chiarcos Thanks for the clarification.

chiarcos commented 4 years ago

As an afterthought: Where it is not possible/necessary to provide RDF statements in inline XML, the standard solution (i.e., the only solution that is both TEI-compliant and W3C- [or otherwise] standardized) would be to use a standoff annotation with Web Annotation (JSON-LD) over a TEI/XML document. This works nicely as long as the underlying TEI/XML doesn't change anymore (such that URIs, resp. XPaths or offsets -- whatever selector is used for Web Annotation -- still point to the right element), but it is not feasible for content under production.

Permitting RDFa in TEI is actually conceptually compatible with the recommendation to use Web Annotation for standoff annotation, as an RDFa serialization of Web Annotation has been developed, too: https://www.w3.org/community/openannotation/wiki/RDFa, resp. https://www.w3.org/TR/annotation-html/#annotations-embedded-as-rdfa

peterstadler commented 4 years ago

We discussed that issue briefly during our virtual f2f this weekend. If I understand correctly, the current issue is about expressing "RDF triples inline in TEI" where the straightforward solution would be to add RDFa attributes to (nearly?) all TEI elements. While this might not be a proper solution to be incorporated into the TEI standard, would it still be helpful to have that as an example customization at https://tei-c.org/guidelines/customization/ (in analogy to TEI + SVG or TEI + Math)?

chiarcos commented 4 years ago

If I understand correctly, the current issue is about expressing "RDF triples inline in TEI" where the straightforward solution would be to add RDFa attributes to (nearly?) all TEI elements.

Let's call that the maximum solution, and it is clearly not the best way for incorporation into the TEI standard.*

would it still be helpful to have that as an example customization at https://tei-c.org/guidelines/customization/ (in analogy to TEI + SVG or TEI

  • Math)?

Very much so, if - this is presented as a TEI-endorsed approach (i.e., under "Customizations provided by the TEI Consortium"), and - candidate elements for a native TEI encoding of RDF triples (all discussed in this thread) are complemented with a link to the TEI+RDFa customization in the guidelines (something like "Note that this element should not be used for the encoding of RDF graphs in inline TEI, instead, see the ..."), and - the examples for using for encoding RDF triples are deprecated in the guidelines (and replaced by [or at least, complemented with] a reference to the TEI+RDFa customization)

I think these conditions are necessary to give TEI users a clear guidance and to guarantee interoperability among different projects and between TEI and LOD communities. As long as TEI users see their graphs as independent from RDF, they remain free to model it however they like, but if an RDF interpretation is intended, it should be marked as such.

I would be happy to contribute to the development of such a customization and its documentation.

A disadvantage of the customization approach is that customizations seem to be monolithic. As I am less into TEI than into LOD, is it possible to combine different customizations with each other? In the TEI-Drama customization, RDFa would be useful for entity linking, in TEI-Corpus, it could complement standoff markup and feature structures, and in the TEI-MS customization, it would be useful for intertextual relations, in other existing customizations, it would be useful for object metadata. For lexical resources, a novel Dict+RDFa customization that combines TEI Dict with OntoLex could be useful. In the end we might end up with a very large number of customizations, basically every customization with and without RDF(a), respectively.

Thanks a lot, Christian

martinascholger commented 4 years ago

@peterstadler and I discussed the issue in a meeting on July, 1. Based on the discussion, Peter started with a first draft for an example customization.

chiarcos commented 4 years ago

Am .07.2020, 21:06 Uhr, schrieb Martina Scholger
notifications@github.com:

@peterstadler and I discussed the issue in a meeting on July, 1. Based
on the discussion, Peter started with a first draft for an example
customization. Great news! Let me know how to help.

peterstadler commented 4 years ago

Just for the record: The current draft of the customisation ODD is added in the branch issue-1860 at https://github.com/TEIC/TEI/commit/151136c0a4f557c3136ad232a7c5c4ef37bb772d. It simply adds all RDFa attributes to a new class att.global.analytic.rdfa and hooks this class into att.global.analytic.

RobertoRDT commented 3 years ago

Dear all, any new developments on this? Has anyone tested the new customisation? Would you suggest that the RDFa attributes are a good solution? I would like to do some experimental work with ontologies and RDF-like triples, hope that the "clear guidance" mentioned by Christian arrives at some point in time.

Thank you for your work,

R

chiarcos commented 3 years ago

AFAIK, the status so far is that there were two concrete applications of TEI+RDFa that motivated the customization. Data under https://github.com/pruizf/disco (includes TEI+RDFa raw data) and http://www.deaf-page.de/guichaulmTel/edition.html (HTML with RDFa from TEI+RDFa preserved, read off RDF with https://www.w3.org/2012/pyRdfa/extract?uri=http%3A%2F%2Fwww.deaf-page.de%2FguichaulmTel%2Fedition.html, use the latter link to explore the graph, e.g. using the FROM keyword of the web service at http://www.sparql.org/sparql.html). Links for descriptions can be found in this thread. However, both precede the customization.

Following our 2018 experiments, I applied for a 3-year project on 16th c. Lithuanian postils where the customization is foreseen to be used wide-scale for linking between edition and dictionaries, as well as for intertextual links between the Old Lithuanian texts and their German or biblical sources. This was approved in Dec 2020, but due to administrative delays at my university, it has not started yet. Otherwise, this would have been the demonstrator you're asking for. Anyway, even though delayed, it will follow the agenda we laid out for it, so including a broad-scale application (and validation) of the TEI+RDFa customization.

I know that the colleagues at the Heidelberg Academy of Sciences were very much interested in continuing the work on http://www.deaf-page.de/guichaulmTel/edition.html for other Romance data, but I don't think that a concrete follow-up project has yet manifested itself. You might want to reach out to Sabine Tittel (contact details in the TEI+RDFa paper) for confirmation.

I'm not exactly unbiased, but I guess it is fair to say that TEI+RDFa will work as a representation formalism (to the same extent as most "TEI-native" alternatives, except that the latter have ambiguous semantics). How it performs in established TEI workflows (rather in newly created ones) will depend on specifics of the project. And if you plan to either embed RDF(a) into your generated markup or extract it from your TEI+RDFa source data, there is technology at hand to do so, so for a new project where text edition and RDF annotation evolve simultaneously, it would be my first choice for this very reason.

If in your scenario, text edition is completed before RDF annotation begins, Web Annotation+TEI (as used in Recogito) would be more established, but it's standoff and therefore both a bit brittle (in terms of data consistency) and technically challenging (you need to set up and synchronize an XML and a JSON-LD workflow -- unless you're happy with what Recogito can already do).

If you're looking for a place to discuss that, please feel free to reach out to https://www.w3.org/community/ld4lt/, where we are in the process to harmonize linguistic annotations in an RDF-compliant way ( https://github.com/ld4lt/linguistic-annotation). The focus of that group is not TEI, but (annotation with) RDF, but TEI (TEI with JSON-LD markup or inline RDF annotations) is a key aspect in the discussion. Except for adhering to independently established standards for RDF data for which there is independent tooling available (this does rule out the earlier TEI practices), there's no clear recommendation coming out of that, yet, because there are multiple candidate vocabularies that need to be harmonized (TEI among them) and harmonization is a long-term effort that will take some time to arrive at any consolidated model. I expect that this will be an extension of Web Annotation, and support RDF inline annotations in accordance with https://www.w3.org/TR/annotation-html/ [this is not a standard, but just a working note], i.e., as a special case of TEI+RDFa, but this is an educated guess, only.

JanelleJenstad commented 1 year ago

Revisited at Guelph 2023 F2F. Peter Stadler has rotated off Council. @HelenaSabel (whose work is mentioned in this ticket) will review the draft ODD and get things moving again.