TEIC / Stylesheets

TEI XSL Stylesheets
235 stars 125 forks source link

odd2lite.xsl output invalid #328

Open joeytakeda opened 6 years ago

joeytakeda commented 6 years ago

I'm playing around with odd2lite.xsl and noticed that the output was invalid TEI, nesting <ab>s within <ab>:

<cell rend="wovenodd-col2">
                  <ab rend="parent">
                    <ab rend="specChildren">
<ab rend="specChild">
                      <seg rend="specChildModule">header: </seg>
                      <seg rend="specChildElements">
                        <ref target="#TEI.titleStmt" rend="link_odd_elementSpec">titleStmt</ref>
                      </seg>
                    </ab>
                    </ab>
                  </ab>
    </cell>

I'm ODD chaining and using tei_bare.odd as my example:

saxon -s:https://raw.githubusercontent.com/TEIC/TEI/master/P5/Exemplars/tei_bare.odd -o:tei_bare_compiled.odd -xsl:https://raw.githubusercontent.com/TEIC/Stylesheets/master/odds/odd2odd.xsl;
saxon -s:tei_bare_compiled.odd -o:tei_bare.xml -xsl:https://raw.githubusercontent.com/TEIC/Stylesheets/master/odds/odd2lite.xsl

Edit: And I'm using saxon9he.

joeytakeda commented 6 years ago

I'm not sure if the odd2lite transformation is meant to create valid TEI or valid TEI Lite, but it isn't valid against either. A quick summary:

tei_all.rng (https://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng)

157 errors / 4 distinct:

    Abstract model violation: ab may not occur inside paragraphs or other ab elements.
    element "ab" not allowed here; expected the element end-tag, text or element "abbr", "add", "addName", "addSpan", "address", "affiliation", "alt", "altGrp", "am", "anchor", "app", "att", "bibl", "biblFull", "biblStruct", "binaryObject", "bloc", "c", "caesura", "camera", "caption", "castList", "catchwords", "cb", "certainty", "choice", "cit", "cl", "classSpec", "climate", "code", "constraintSpec", "corr", "country", "damage", "damageSpan", "dataSpec", "date", "del", "delSpan", "depth", "desc", "dim", "dimensions", "distinct", "district", "eg", "elementSpec", "email", "emph", "ex", "expan", "fLib", "figure", "floatingText", "foreign", "forename", "formula", "fs", "fvLib", "fw", "g", "gap", "gb", "genName", "geo", "geogFeat", "geogName", "gi", "gloss", "graphic", "handShift", "height", "heraldry", "hi", "ident", "idno", "incident", "index", "interp", "interpGrp", "join", "joinGrp", "kinesic", "l", "label", "lang", "lb", "lg", "link", "linkGrp", "list", "listApp", "listBibl", "listEvent", "listNym", "listOrg", "listPerson", "listPlace", "listRef", "listRelation", "listTranspose", "listWit", "location", "locus", "locusGrp", "m", "macroSpec", "material", "measure", "measureGrp", "media", "mentioned", "metamark", "milestone", "mod", "moduleSpec", "move", "msDesc", "name", "nameLink", "notatedMusic", "note", "ns:egXML", "num", "oRef", "objectType", "offset", "orgName", "orig", "origDate", "origPlace", "outputRendition", "pRef", "pause", "pb", "pc", "persName", "phr", "placeName", "population", "precision", "ptr", "q", "quote", "redo", "ref", "reg", "region", "respons", "restore", "retrace", "rhyme", "roleName", "rs", "s", "said", "secFol", "secl", "seg", "settlement", "shift", "sic", "signatures", "soCalled", "sound", "space", "span", "spanGrp", "specDesc", "specGrp", "specGrpRef", "specList", "stage", "stamp", "state", "subst", "substJoin", "supplied", "surname", "surplus", "table", "tag", "tech", "term", "terrain", "time", "timeline", "title", "trait", "unclear", "undo", "unit", "val", "view", "vocal", "w", "watermark", "width", "witDetail" or "writing" (with xmlns:ns="http://www.tei-c.org/ns/Examples")
    element "attRef" not allowed here; expected element "addSpan", "alt", "altGrp", "anchor", "app", "binaryObject", "cb", "certainty", "damageSpan", "delSpan", "fLib", "figure", "formula", "fs", "fvLib", "fw", "gap", "gb", "graphic", "head", "incident", "index", "interp", "interpGrp", "join", "joinGrp", "kinesic", "lb", "link", "linkGrp", "listTranspose", "media", "metamark", "milestone", "notatedMusic", "note", "pause", "pb", "precision", "respons", "row", "shift", "space", "span", "spanGrp", "substJoin", "timeline", "vocal", "witDetail" or "writing"
    element "index" not allowed yet; expected the element end-tag or element "term"

The main issues are:

  <cell rend="wovenodd-col2"><seg xml:lang="en">Attributes </seg><ref
                    target="#TEI.att.global">att.global</ref> (<hi rend="attribute">@xml:id</hi>,
                    <hi rend="attribute">@n</hi>, <hi rend="attribute"
                  >@xml:lang</hi>)<c xml:space="preserve"> </c> (<ref
                    target="#TEI.att.global.rendition">att.global.rendition</ref> (<hi
                    rend="attribute">@rendition</hi>)) <ref target="#TEI.att.sortable"
                    >att.sortable</ref> (<hi rend="attribute"
                    >@sortKey</hi>)<c xml:space="preserve"> </c><ref target="#TEI.att.typed"
                    >att.typed</ref> (<seg rend="unusedattribute">type</seg>, @subtype) <table
                    rend="attList">
                    <attRef xmlns:teix="http://www.tei-c.org/ns/Examples" rend="none"
                      class="att.global"/>
                    <attRef xmlns:teix="http://www.tei-c.org/ns/Examples" rend="none"
                      class="att.sortable"/>
                    <attRef xmlns:teix="http://www.tei-c.org/ns/Examples" class="att.typed"
                      name="subtype"/>
<div type="refdoc" xml:id="TEI.att.canonical">
            <head>att.canonical</head>
            <table rend="wovenodd">
              <row>
                <cell cols="2" rend="wovenodd-col2"><hi rend="label">att.canonical</hi><index
                    indexName="ODDS"><term>att.canonical (attribute class)</term><index
                      indexName="ODDS"><term sortKey="key">@key</term></index><index
                      indexName="ODDS"><term sortKey="ref">@ref</term></index></index> <seg
                    xml:lang="en">provides attributes which can be used to associate a
                    representation such as a name or title with canonical information about the
                    object being named or referenced.</seg> [<ref
                    xmlns:teix="http://www.tei-c.org/ns/Examples"
                    target="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDATTSnr"
                    >13.1.1. Linking Names and Their Referents</ref>]</cell>
              </row>

tei_lite.rng (http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_lite.rng)

553 errors / 6 distinct.

attribute "xml:base" not allowed here; expected attribute "ana", "cert", "corresp", "facs", "n", "next", "prev", "rend", "resp", "source", "subtype", "type", "xml:id", "xml:lang" or "xml:space"
element "ab" not allowed anywhere; expected the element end-tag, text or element "abbr", "add", "address", "anchor", "att", "bibl", "choice", "cit", "code", "corr", "date", "del", "desc", "eg", "emph", "expan", "figure", "foreign", "formula", "gap", "gi", "gloss", "graphic", "hi", "ident", "idno", "index", "interp", "interpGrp", "l", "label", "lb", "lg", "list", "listBibl", "mentioned", "milestone", "name", "note", "num", "orig", "p", "pb", "pc", "ptr", "q", "ref", "reg", "rs", "s", "seg", "sic", "soCalled", "sp", "stage", "table", "term", "time", "title", "unclear", "val" or "w"
element "attRef" not allowed anywhere; expected element "anchor", "figure", "formula", "gap", "graphic", "head", "index", "interp", "interpGrp", "lb", "milestone", "note", "pb" or "row"
element "c" not allowed anywhere; expected text or element "abbr", "add", "address", "anchor", "att", "bibl", "choice", "cit", "code", "corr", "date", "del", "desc", "eg", "emph", "expan", "figure", "foreign", "formula", "gap", "gi", "gloss", "graphic", "hi", "ident", "idno", "index", "interp", "interpGrp", "l", "label", "lb", "lg", "list", "listBibl", "mentioned", "milestone", "name", "note", "num", "orig", "pb", "pc", "ptr", "q", "ref", "reg", "rs", "s", "seg", "sic", "soCalled", "stage", "table", "term", "time", "title", "unclear", "val" or "w"
element "c" not allowed anywhere; expected the element end-tag, text or element "abbr", "add", "address", "anchor", "att", "bibl", "choice", "cit", "code", "corr", "date", "del", "desc", "eg", "emph", "expan", "figure", "foreign", "formula", "gap", "gi", "gloss", "graphic", "hi", "ident", "idno", "index", "interp", "interpGrp", "l", "label", "lb", "lg", "list", "listBibl", "mentioned", "milestone", "name", "note", "num", "orig", "p", "pb", "pc", "ptr", "q", "ref", "reg", "rs", "s", "seg", "sic", "soCalled", "sp", "stage", "table", "term", "time", "title", "unclear", "val" or "w"
element "c" not allowed anywhere; expected the element end-tag, text or element "abbr", "add", "address", "anchor", "att", "bibl", "choice", "cit", "code", "corr", "date", "del", "desc", "eg", "emph", "expan", "figure", "foreign", "formula", "gap", "gi", "gloss", "graphic", "hi", "ident", "idno", "index", "interp", "interpGrp", "l", "label", "lb", "lg", "list", "listBibl", "mentioned", "milestone", "name", "note", "num", "orig", "pb", "pc", "ptr", "q", "ref", "reg", "rs", "s", "seg", "sic", "soCalled", "stage", "table", "term", "time", "title", "unclear", "val" or "w"
element "egXML" not allowed anywhere; expected the element end-tag, text or element "ns:abbr", "ns:add", "ns:address", "ns:anchor", "ns:att", "ns:bibl", "ns:choice", "ns:cit", "ns:code", "ns:corr", "ns:date", "ns:del", "ns:desc", "ns:eg", "ns:emph", "ns:expan", "ns:figure", "ns:foreign", "ns:formula", "ns:gap", "ns:gi", "ns:gloss", "ns:graphic", "ns:hi", "ns:ident", "ns:idno", "ns:index", "ns:interp", "ns:interpGrp", "ns:l", "ns:label", "ns:lb", "ns:lg", "ns:list", "ns:listBibl", "ns:mentioned", "ns:milestone", "ns:name", "ns:note", "ns:num", "ns:orig", "ns:p", "ns:pb", "ns:pc", "ns:ptr", "ns:q", "ns:ref", "ns:reg", "ns:rs", "ns:s", "ns:seg", "ns:sic", "ns:soCalled", "ns:sp", "ns:stage", "ns:table", "ns:term", "ns:time", "ns:title", "ns:unclear", "ns:val" or "ns:w" (with xmlns:ns="http://www.tei-c.org/ns/1.0")
element "egXML" not allowed anywhere; expected the element end-tag, text or element "tei:abbr", "tei:add", "tei:address", "tei:anchor", "tei:att", "tei:bibl", "tei:choice", "tei:cit", "tei:code", "tei:corr", "tei:date", "tei:del", "tei:desc", "tei:eg", "tei:emph", "tei:expan", "tei:figure", "tei:foreign", "tei:formula", "tei:gap", "tei:gi", "tei:gloss", "tei:graphic", "tei:hi", "tei:ident", "tei:idno", "tei:index", "tei:interp", "tei:interpGrp", "tei:l", "tei:label", "tei:lb", "tei:lg", "tei:list", "tei:listBibl", "tei:mentioned", "tei:milestone", "tei:name", "tei:note", "tei:num", "tei:orig", "tei:p", "tei:pb", "tei:pc", "tei:ptr", "tei:q", "tei:ref", "tei:reg", "tei:rs", "tei:s", "tei:seg", "tei:sic", "tei:soCalled", "tei:sp", "tei:stage", "tei:table", "tei:term", "tei:time", "tei:title", "tei:unclear", "tei:val" or "tei:w"
element "index" not allowed yet; expected the element end-tag or element "term"
element "quote" not allowed anywhere; expected element "anchor", "bibl", "cit", "eg", "figure", "gap", "index", "interp", "interpGrp", "lb", "listBibl", "milestone", "note", "pb", "ptr", "q" or "ref"
element "quote" not allowed anywhere; expected the element end-tag, text or element "abbr", "add", "address", "anchor", "att", "bibl", "choice", "cit", "code", "corr", "date", "del", "desc", "eg", "emph", "expan", "figure", "foreign", "formula", "gap", "gi", "gloss", "graphic", "hi", "ident", "idno", "index", "interp", "interpGrp", "l", "label", "lb", "lg", "list", "listBibl", "mentioned", "milestone", "name", "note", "num", "orig", "p", "pb", "pc", "ptr", "q", "ref", "reg", "rs", "s", "seg", "sic", "soCalled", "sp", "stage", "table", "term", "time", "title", "unclear", "val" or "w"
martindholmes commented 6 years ago

I think this has come up before, and Council intends to do something about it, but it's low priority because odd2lite is almost always just a waypoint on the trip to something else (RNG, XHTML, PDF).

joeytakeda commented 6 years ago

That makes sense--I wasn't able to find anything about it in any of the repositories, but it might be an undocumented thing. But, imho, it is strange that the TEI is creating (X)HTML/PDFs from invalid TEI as well as offering invalid TEI via Oxgarage.

Plus, I think the odd2lite stylesheet is really helpful since the Lite XML is a much easier form of the document to process into project specific documentation.

lb42 commented 6 years ago

odd2lite is really specific to the production of the Guidelines, and I am not sure it serves as a good model for project-specific documentation. You're better off transforming your ODD to HTML in oXygen and tweaking the associated CSS, methinks.

Also I find this particular bug MUCH less annoying/significant than the fact that the odd2odd stylesheet also generates invalid TEI! see #319 : it's more significant because compiled ODDs are likely to be used for many purposes, not just generation of documentation, and so really should be proper TEI.

joeytakeda commented 6 years ago

I’m post processing this anyways into a TEI that is conformant with the project’s schema (and that uses project prefixes etc), so it’s not that I’m struggling to work with the Lite output. Instead, it’s just that I find it misleading that the Lite output is available as a standalone format, but isn’t actually meant to be used as such.

Is there a place—the Wiki or somewhere else—where the purpose/function of the Lite stylesheet can be explained?

lb42 commented 6 years ago

Why not start from the compiled odd in that case?

joeytakeda commented 6 years ago

This was mostly an experiment to see how easy it would be to create a version of the Guidelines that conformed to the look/feel of a project. I wanted the Guidelines to become a document like the rest of my TEI documents, but I didn't want to have to write the more complicated manipulation of the compiled ODD to something like the guidelines. Plus, I pass all of the HTML through a validator, which, as Martin notes in #314, is invalid, so I wanted to see if there was a way to skip cleaning up the HTML and instead clean up the source.

That said, the main problem here is that this stylesheet is misleading. I agree that there are bigger fish to fry, so if this low priority for now and won't be fixed for a while, then there should be something about the function/status of that transformation in the file itself. I would be happy if: 1) a comment was added to the top of odd2lite.xsl, which explains that the stylesheet currently produced invalid TEI Lite and 2) the Oxgarage output option "ODD Document as TEI Lite" had a disclaimer or the text was changed in someway to signal that it isn't actually TEI lite.

martindholmes commented 6 years ago

@joeytakeda There's a page documenting ODD processing here:

https://wiki.tei-c.org/index.php/Mapping_ODD_processing

martindholmes commented 5 years ago

New branch martindholmes-issue-328 for @joeytakeda and me to work on this.

martindholmes commented 5 years ago

The first two errors to tackle are those where the supposedly Lite file is not even valid against tei_all. Those errors are:

  1. Nested <ab> elements. 2, <attRef> elements which apparently aren't even needed any more; they end up in the documentation for elements.

First thing to do: confirm that the <attRef> elements are actually not needed, by processing the bare "lite" file into XHTML with them and without them. If there's no difference, we can just make sure they're dropped.

After that, the nested <ab> problem is more complicated. <ab> is not even allowed in Lite. Should it be? If so, we still have the nesting problem.

martindholmes commented 5 years ago

This is the tei_bare.odd run through teitoodd then teitolite --odd.

tei_bare_odd_to_odd_to_lite.xml.zip

martindholmes commented 5 years ago

@joeytakeda Have a look at this ticket, which I hope will make one category of our issues moot: https://github.com/TEIC/TEI/issues/1856.

martindholmes commented 5 years ago

Based on the Council F2F in Graz, I believe it's unlikely that nested <ab> elements will ever be valid, so this strategy in odd2lite needs to be replaced by something better. The simplest option might actually be another processing pass at the end that revises these structures specifically.

joeytakeda commented 4 years ago

odd2lite.xsl also seems to drop child <desc>s in <graphic>. This:

<p>Here is the TEI badge: <graphic url="http://www.tei-c.org/wp-content/uploads/2016/11/I-use-TEI.png"><desc>The TEI Badge</desc></graphic></p>

Becomes

        <p>Here is the TEI badge: <graphic
                    url="http://www.tei-c.org/wp-content/uploads/2016/11/I-use-TEI.png">The TEI
                    Badge</graphic></p>
martindholmes commented 2 years ago

@joeytakeda We will be getting nested abs in the next TEI release, so we could revive this approach. We're a long way behind, though, so perhaps we should delete our temporary branch and start again?

martindholmes commented 12 months ago

@joeytakeda We now have nested <ab> elements; does this solve the problem?