Open ingoboerner opened 3 months ago
TEI stylesheet for merging TEI ODD specification with source to make a new source document. https://tei-c.org/release/doc/tei-xsl/odds/odd2odd0.html#odd2odd.xsl
This little guide is intended to explain the mechanism of ODD chaining. An ODD file specifies a particular view of the TEI, by selecting particular elements, attributes, etc. from the whole of the TEI. But you can also refine such a specification further, making your ODD derive from another one. In principle you can chain together ODDs in this way as much as you like. You can use this feature in several different ways: • you can add additional restrictions to an existing ODD, for example by changing the value list of an attribute • you can further reduce the subset of elements provided by an existing ODD • you can add new elements or modules to an existing ODD
One [
@source
] with the value ‘mySuperODD.subset.xml’ will go looking for declarations in a file of that name in the current source tree. And one with the value ‘http://example.com/superODDs/anotherSubset.xml’ will go looking for it at the URL indicated.
How about using the TEI Drama ODD provided by the TEI consortium (also available with TEI Roma) as the source for the DraCor ODD. We would have to add some elements like particDesc
, standOff
and listEvent
which seem to be omitted there, and adjust some content models. But then we would perhaps already have a reasonable starting point.
So we would need to use the https://tei-c.org/release/xml/tei/custom/odd/tei_drama.odd in the @source
of <schemaSpec>
and hope for the best? The old <schemaSpec>
already included a good subset of elements I think. Will test it with the drama ODD though.
Legacy ODD included 82 elements; if I would include all modules that were in in the legacy odd we end up with 315 elements
The TEI Drama ODD includes the following modules:
<schemaSpec ident="tei_drama" start="TEI teiCorpus">
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="tei"/>
<moduleRef key="textstructure"/>
<moduleRef key="linking"/>
<moduleRef key="drama"/>
<!-- ... -->
The schema contains 226 elements.
So we would need to use the https://tei-c.org/release/xml/tei/custom/odd/tei_drama.odd in the
@source
of<schemaSpec>
and hope for the best? The old<schemaSpec>
already included a good subset of elements I think. Will test it with the drama ODD though.
We could use Roma to start from the TEI Drama ODD, add the missing elements there and then use the resulting ODD for further refinement to our purposes.
I already copied it together in my local draft of the ODD. It seems to work without @source
, but explicitly re-using this Drama ODD
<div xml:id="div_schema">
<head>Schema</head>
<schemaSpec ident="dracor-api" docLang="en" prefix="tei_" xml:lang="en" start="TEI">
<!-- modules included in the tei_drama ODD:
header, core, tei, textstructure, linking, drama
-->
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="tei"/>
<moduleRef key="textstructure" except="div1 div2 div3 div4 div5 div6 div7"/>
<moduleRef key="linking"/>
<moduleRef key="drama"/>
<!-- The dracor-legacy ODD also included additional elements from the following modules: -->
<moduleRef key="namesdates"
include="event forename genName listEvent listPerson listRelation nameLink person personGrp persName relation surname"/>
<moduleRef key="corpus" include="particDesc"/>
<moduleRef key="figures" include="figure"/>
<!-- ... -->
</schemaSpec>
Results in 233 Elements. Maybe we can later go through the element list and kick some of them out again. Next step would be to look into the requirements of the API , e.g. specific encoding of the digital and original sources in the <bibl>
elements in <sourceDesc>
. I would do that with Schematron, e.g.
<!-- sourceDesc -->
<elementSpec ident="sourceDesc" module="header" mode="change">
<constraintSpec ident="digital_source_in_sourceDesc" scheme="schematron"
mode="add">
<desc>Checks if a digital source is present in the
<gi>sourceDesc</gi></desc>
<constraint>
<sch:rule context="tei:sourceDesc">
<sch:assert test="tei:bibl[@type eq 'digitalSource']">Digital
source is missing </sch:assert>
</sch:rule>
</constraint>
</constraintSpec>
<constraintSpec ident="original_source_in_sourceDesc" scheme="schematron"
mode="add">
<desc>Checks if a original source for a digital source is
available</desc>
<constraint>
<sch:rule
context="tei:sourceDesc/tei:bibl[@type eq 'digitalSource']">
<sch:assert test="tei:bibl[@type eq 'originalSource']">Original
Source for digital source is missing </sch:assert>
</sch:rule>
</constraint>
</constraintSpec>
</elementSpec>
OK, I would propose the following:
<div xml:id="play_id">
<head>Play ID</head>
<p>Feature <idno type="feature-no">P2</idno> <idno type="feature-id">play_id</idno>: <name>DraCor ID</name> of the play, e.g. <val>ger000171</val>.</p>
<p>In the TEI source file the <name>DraCor ID</name> is contained in the attribute <att>xml:id</att> on the root element <gi>TEI</gi>.</p>
<p>The identifier SHOULD match the Regular Expression <val>^[a-z]+[0-9]{6}$</val>.</p>
</div>
<constraintSpec ident="valid_dracor_ids_on_root_tei_element"
scheme="schematron" mode="add" corresp="#play_id">
<desc>DraCor identifiers should consist of lower case letters followed by a six-digit number. The value is returned as feature
<ref target="#play_id">play_id</ref> in the API response object.</desc>
<constraint>
<sch:rule context="tei:TEI" role="warning">
<sch:assert test="matches(./@ xml:id,'^[a-z]+[0-9]{6}$')"> For
DraCor IDs we recommend the pattern ^[a-z]+[0-9]{6}$
</sch:assert>
</sch:rule>
</constraint>
The result in the rendered HTML ODD:
The Schematron Rule links to the feature ref
/ @corresp
:
The generated RelaxNG contains the Schematron rules and can be used in Oxygen to validate a file. In the example it now produces a warning:
There is another/additional option to check if a TEI file supports certain API features.
In <schemaSpec>
we can include <constraintSpec>
elements with schematron rules that explicitly report (!) /(not assert) if a certain condition in the encoding is met.
An example: If the file contains <title type="main">Whatever Main Title</title>
the API will be able to return the title info in the response objects. We can now include a schematron rule/constraintSpec that checks exactly for that and report that a feature is supported
if it is not supported, I provide a "Warning" which might help encoders to add the elements that are needed for a feature to be supported:
As @cmil, @lehkost and me discussed briefly at the CCLS conference the current ODD file feels cluttered and is quite hard to maintain. We maybe want to rework it and modularize it so that we can adapt it for certain corpora more easily:
1) Maybe use "tei_all" in the first place as base (minimal requirement: validate against TEI all); include all that is there, i.e.
exclude=""
onmoduleRef
; maybe only from relevant modules, i.e.compare to current version: https://github.com/dracor-org/dracor-schema/blob/00fb7ea86c11f47a0b871bc8fff9c30f891008fa/dracor.odd#L527-L550
This file might do nothing else.
2) We then need to restrict the usage of some elements or change the content model, the values of some attributes that are relevant to the API. These element changes will affect certain elements/attributes that we need to restrict because the API to some degree expects to find certain things (@ xml:id on root
<TEI>
) and is confused when there are some unexpected things, e.g. multiple<text>
elements as in some swedracor files..3) take (2) and add examples
<exemplum>
if we have them for certain elements, e.g. currently https://github.com/dracor-org/dracor-schema/blob/00fb7ea86c11f47a0b871bc8fff9c30f891008fa/dracor.odd#L771-L869 In the "examples odd" file we would do (maybe rework@source
and include something with is based on a defined prefix inprefixDecl
(or how the element is called):But the question remains how we put that together in the end?