TEIC / Stylesheets

TEI XSL Stylesheets
233 stars 124 forks source link

More than one altIdent in an elementSpec causes crash in teitorelaxng #243

Open lb42 opened 7 years ago

lb42 commented 7 years ago

Consider an elementSpec like the following

<elementSpec ident="book">
<altIdent xml:lang="fr">livre</altIdent>
<altIdent xml:lang="de">buch</altIdent>
<!-- ... -->
</elementSpec>

Within a given schemaSpec, one can presumably select the required language for the element identifier by using @targetLang. However the stylesheet crashes before getting that far

/usr/share/xml/tei/stylesheet/common/teianttasks.xml:349: Fatal error during transformation using /usr/share/xml/tei/stylesheet/profiles/default/relaxng/to.xsl: A sequence of more than one item is not allowed as the first argument of normalize-space() ("livre", "buch") ; SystemID: file:/usr/share/xml/tei/stylesheet/common/functions.xsl; Line#: 1377; Column#: 60

lb42 commented 7 years ago

If however the intent is that only one altIdent should be permitted, the content model should reflect this. Or there should be a schematron constraint to that effect.

jamescummings commented 7 years ago

Confirmed that this happens and that the problem is indeed the normalize-space() in tei:createNameSpec function at https://github.com/TEIC/Stylesheets/blob/dev/common/functions.xsl#L1373

I've always assumed that what should happen with multiple altIdents is that multiple elements should be created in the RNG identical except that they have different names. The language question is an interesting one, whether when you generate your schema specifying french, does it only produce 'livre' as an element or 'buch' as well.

Currently I believe that the stylesheets treat altIdent as the new canonical name for an element and I think this is sort of wrong. What would be nice to be able to do is say that here are some alternative identifiers for this gi, but that it persists unless deleted, and there can be multiple ones. Options as I see it are:

a) Decide that only one altIdent is allowed, and change the content model of altIdent or implement a schematron rule (In Lou's example, he'd only be able to rename book to 'livre' and would only get that.) b) Decide that more than one altIdent is allowed, that these replace the original ident and are only utilized when a particular language is used for the schema. (And make xml:lang required on altIdent.) (In Lou's example that would be fine, and processing would determine if 'livre' or 'buch' was the element created.) c) Decide that more than one altIdent is allowed, that multiple schema element definitions are created, one for each altIdent regardless of language. (In Lou's example, 'livre', and 'book' are both available and identical other than their name.) d) Everything with 'c' but that the original ident persists as well. (In Lou's example 'book', 'livre', and 'buch' are all available and identical other than their names.)

I believe the processing currently assumes 'a' but the content model of altIdent doesn't. I think that having 'c' or 'd' would certainly enrich the power of ODD. I think tying use of multiple altIdents to language is a bad idea (since there might be so many other reasons to have altIdents).

lb42 commented 7 years ago

Thanks James. None of these options will make your resulting document valid against TEI All, of course, so this feature also raises a conformance problem; I suppose if you adopt option c or d you could also insist that a new TEI All should be generated.

jamescummings commented 7 years ago

Isn't that true of any use of altIdent? If I rename the text element to 'book' then it won't validate against tei_all either. The only way to make use of altIdent conformant is to somehow put the new element name in a new namespace? Or are things conformant where we have used equiv?

martindholmes commented 7 years ago

I don't see how any document which does not validate against the generic tei_all can be called conformant. altIdent and equiv are documentation of a possible path to conformance, aren't they? Otherwise you could construct a completely mad schema where random TEI elements are replaced by elements with names that are completely at odds with the original intent (forgive the pun) and claim it to be conformant TEI.

lb42 commented 7 years ago

Yes, I agree. On conformance, see further thoughts at https://foxglove.hypotheses.org/522 Comments welcome there, or here.

jamescummings commented 7 years ago

If we agree that any use of altIdent leads to non-conformance, that is fine. But in terms of what ODD should allow or not, do you agree it might be valuable to have multiple altIdent elements allowed? And if so what does that mean?

I can imagine a use-case where I want a number of syntactic-sugar elements for an element. Let's say 'term'. Some project wants specialised elements for different types of term and instead of typing in <term type="foo" subType="blort"> they really want a <foo type="blort">. They also want a <bar type="wibble">. But I can tell these are all semantically really term elements. Silly, but fair enough, they are insistent and paying for it. Instead of defining new elements, if I could just say that term had some alternative identifiers of 'foo' and 'blort' then I'd be done. (Aside from maybe deleting subType to make my equiv transformation easier.) Clearly, their use is non-conformant (but easily transformed into conformant documents).

Should that be enabled by allowing multiple altIdents? Or should it be forbidden?

lb42 commented 7 years ago

I am comfortable with the idea that any use of altIdent leads to nonconformance, for most definitions of non-conformance that we might come up with. And your use case is entirely plausible: I can cite a cataloguing incunables project at the BNF which agreed to use the TEI msDesc only on condition that all elements named msX were renamed bookX. The problem with all this is two fold (a) you have to ensure that the new names are unique, and probably that they don't duplicate any old name (b) the old names are retained as the names of the Relaxng pattern defining the element, but lost in DTD and XSD outputs.

Thinking about "easily transformed" sounds like we are teetering on the brink of reinventing the notion of "TEI conformable" :-(

jamescummings commented 7 years ago

I certainly do not want to re-invent 'conformable' as a concept. Everything is 'conformable'.

a) If they are not unique, that should be an error... though perhaps in the target schema language and just a warning in the TEI ODD? b) I don't care about DTDs. Yes, it would be nice if they were retained in XSD as a method to signal a way back, but instead people should use schemaRef to point to their TEI ODD which will make the source of this weird bookX element clear. ;-)

That sounds like a vote in favour of 'c' or 'd' in my list above. i.e. that multiple altIdents should be allowed...

lb42 commented 7 years ago

According to 23.1.2, the renamed elements should be in a different namespace. It also says the TEI "provides a systematic set of renamings....[which] all use a language-specific namespace". So this seems like another bug.

jamescummings commented 7 years ago

That bug reported as https://github.com/TEIC/TEI/issues/1613 I think that sentence should just be removed.

lb42 commented 7 years ago

Yes, as noted on issue https://github.com/TEIC/TEI/issues/1613, in an ideal world, multiple altIdents would be cool. But be careful: the current stylesheets need a lot of work to handle that.

lb42 commented 7 years ago

We seem to be agreeing that multiple altIdents should be allowed; less convinced that they should result in multiple declarations in the generated schema, i.e. my example above will result in a schema in which all of book, buch, and livre are available. Can I then delete the ones I don't want from my ODD explicitly?