Closed lb42 closed 8 years ago
Looking at our current practice, there seem to be two variants on the way macro.anyXML is used. In one, we permit any XML element from any XML namespace except those specified; in the other, we permit only elements from a specified namespace or namespaces. This suggests that we could implement it in pure ODD as an empty element <anyXML, with two attributes @except and @include. If those attributes are not used, theimplication would be to permit any element from any namespace, which opoens up the DTD duplicate IDs problem again, so I am not sure how that would sensibly be implemented. Maybe we should require the presence of one or other of the two attributes. It would be easy enough to add a new element to the Pure ODD branch along those lines, though implementing the necessary transformation in XSLT might be a tad trickier.
Lou — Not sure I get what you mean, here. We refer to macro.anyXML in 5 places (not counting itself):
At least in the 'dev' branch, it looks like child elements can be from any namespace in all cases.
While it seems to me having @include
and @except
could be useful features for ODD writers, I don't see where we would use them ourselves. So my instinct is they should not hold up publication of PureODD.
You may want to refer back to Hugh's mail of 27/10/15 at 22:01 to the council list
(On 07/12/15 16:37, Syd Bauman wrote:
Lou — Not sure I get what you mean, here. We refer to macro.anyXML in 5 places (not counting itself):
- constraint
- content
- egXML
- macro.schemaPattern
- xenoData At least in the 'dev' branch, it looks like child elements can be from any namespace in all cases.
While it seems to me having |@include| and |@except| could be useful features for ODD writers, I don't see where we would use them ourselves. So my instinct is they should not hold up publication of PureODD.
— Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162583678.
Sorry, I still don't get it. We always use macro.anyXML without further constraint. (I.e., in none of the 5 cases listed previously do we say “and it should all be in X namespace” or “and it should not be in Y namespace”. All 5 are treated the same: any content from any namespace except neither <teix:egXML>
elements nor any elements from the TEI namespace are allowed.
And while I still think @include
and @except
for namespaces of the new <anyXML>
is intriguing and probably useful, it is not at all a good solution to the “conflicting ID-types for attribute "id"” problem. It’s just not OK to say “you can define an element to have any content easily, but then you can’t use it to have content from your own namespace”.
(BTW, I don't understand why <teix:egXML>
is not allowed in macro.anyXML. Not even sure we need to prevent <teix:egXML>
from ever being inside <teix:egXML>
, but if we do we should do that somewhere other than the macro for any XML. E.g., a <constraintSpec>
in the definition of <teix:egXML>
.)
But macro.anyXML comes with built-in constraints, namely that the TEI and TEI Example namespaces are excluded. If you want to include any other namespaces in your TEI that use xml:id, then you must also exclude them from the content of macro.anyXML. It’s really macro.anyOtherXML :-).
We talked about this extensively at the F2F and the consensus was that excluding namespaces in this way was the least bad solution. Given that, and the fact that you’ll need to be able to customize on the fly if your ODD has extra namespaces that use xml:id, I think Lou’s suggestion has merit.
For our new members, to whom I expect this sounds like utter gibberish. It’s a kind of obscure bug that bit us recently. I attempted to explain it in the email Lou referred to: http://lists.tei-c.org/pipermail/tei-council/2015/022174.html http://lists.tei-c.org/pipermail/tei-council/2015/022174.html
On Dec 7, 2015, at 15:53 , Syd Bauman notifications@github.com wrote:
Sorry, I still don't get it. We always use macro.anyXML without further constraint. (I.e., in none of the 5 cases listed previously do we say “and it should all be in X namespace” or “and it should not be in Y namespace”. All 5 are treated the same: any content from any namespace except neither teix:egXML elements nor any elements from the TEI namespace are allowed. And while I still think @include and @except for namespaces of the new
is intriguing and probably useful, it is not at all a good solution to the “conflicting ID-types for attribute "id"” problem. It’s just not OK to say “you can define an element to have any content easily, but then you can’t use it to have content from your own namespace”. — Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162658639.
With all due respect, Mr. Chairman, that turns out not to be the case (on almost all counts).
the TEI and TEI Example namespaces are excluded.
No, only the TEI namespace is excluded. Not only is the TEI Example namespace permitted, it is what macro.anyXML was originally made for!
If you want to include any other namespaces in your TEI that use xml:id, then you must also exclude them from the content of macro.anyXML.
No, there are other solutions, most notably turn off RELAX NG DTD compatability mode, as it is not intended to be used for this, anyway. (It is intended for when all features of DTDs are being emulated, not just the ones we like :-)
F2F … consensus was that excluding namespaces in this way was the least bad solution
It may have been the majority opinion, but it is hard to call that a consensus when I was squealing as loudly as I could that excluding namespaces is a terrible solution. (And there is a good one — turn off DTD compatability mode and use Schematron to check for ID uniquness.)
I think Lou’s suggestion has merit.
I do too, but not for this.
There is another explanation, complete with a solution that does not tell users they can't use elements from TEI or their own namespace in <anyXML>
.
Yes, the current definition for macro.anyXML excludes the <teix:egXML> element itself, and any element from the TEI namespace. I cannot remember why egXMLs cannot nest, except that it makes my head hurt thinking about how you'd validate that, and my proposed <anyXML> element wouldn't address this issue. It does however address the other requirement rather tidily. As for the alternative solutions proposed: my recollection is that we did discuss both of them at the FTF, and rejected them on the grounds that they introduced too much reliance on schematron, and removed something (ID/IDREF validation) clearly of importance to many TEI users.
[Editing for clarity, as @sydb is correct]
Syd is right: only the TEI namespace and <egXML>
itself in the example namespace are excluded.
It may have been the majority opinion, but it is hard to call that a consensus when I was squealing as loudly as I could that excluding namespaces is a terrible solution. (And there is a good one — turn off DTD compatability mode and use Schematron to check for ID uniquness.)
The unfortunate problem with your solution is that this mode is on by default. So things like Oxygen will tell you your schema is invalid. I’m afraid that makes it a less-good solution. People are going to assume their schema (or the TEI) is wrong and I’m not sure how we could communicate effectively to everyone who uses TEI and RNG that they need to do this whenever they validate a TEI document. I agree it sucks, but a solution that requires all of our users to do something extra isn’t practical.
Sorry Hugh, I think you are mistaken about what's being excluded by line 20 of the current definition. It's a <name> element, not a <nsName> .. as I noted above. But this doesn't affect the issue about what is the least worst solution to this mess, where I think you are right.
Ah, fair enough. Wasn’t looking closely enough.
On Dec 7, 2015, at 17:15 , Lou notifications@github.com wrote:
Sorry Hugh, I think you are mistaken about what's being excluded by line 20 of the current definition. It's a
element, not a .. as I noted above. But this doesn't affect the issue about what is the least worst solution to this mess, where I think you are right. — Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162683923.
Starting to think about implementing this, and decided it would be better to name the element anyElement. Wondering whether to make a new branch just to add it, or not.
anyElement suggests that the content must be a well-formed try with a single element, doesn't it? But doesn't macro.anyXML allow well-formed fragments? Or am I misunderstanding?
Presumably your content model would be text or anyElement, any number of times, right?
On Wed, Aug 31, 2016 at 1:29 PM, Martin Holmes notifications@github.com wrote:
anyElement suggests that the content must be a well-formed try with a single element, doesn't it? But doesn't macro.anyXML allow well-formed fragments? Or am I misunderstanding?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-243838844, or mute the thread https://github.com/notifications/unsubscribe-auth/AABbxWCUsz69VeqB7F6UC9iWEENq2KVIks5qlbn-gaJpZM4GY6mW .
I see; Lou's proposing an element that would form part of the anyXML structure, and would represent in it any XML element at all. He's not proposing a single element which would have the function of macro.anyXML.
Actually, I'm not sure. ANY (which is what macro.anyXML translates to in DTD land) means as many as you like of textnode or any-element-defined-in-the-dtd* , and that's what the current macro.anyxml tries to reproduce (except that it allows any element at all, modulo namespace constraints of various kinds).
Proposed changes to implement this are described at http://teic.github.io/TCW/anyXMLproposal.html
Stylesheets implementation is needed to close this, see TEIC/Stylesheets#170
I'm working on implementing TEIC/Stylesheets#170 and have a question for @lb42 and any RelaxNG gurus: the @ns
attribute on <anyName>
doesn't seem to have the effect of requiring content be in that namespace, though it does have the nice side effect of making Oxygen suggest elements from the ns. But unless I'm mistaken @include
on <anyElement>
is intended to require content be in that namespace. Right?
Second question: are we keeping macro.anyXML under this new régime? I'm going to look into auto-generating some schematron to handle the @include
constraint.
Think I've got it working. No surprise, it turns out to be more complicated than expected. I've got to flesh it out a little more and then run it through the tests to see what breaks tomorrow, but I think we're ok.
One note for @lb42: we can't ditch <name ns="http://www.tei-c.org/ns/Examples">egXML</name>
from <except>
because of the xml:id thing. I can make it a default, try to do something smart when it's used in the egXML element, or we can add new syntax for the exclusion of particular elements. Thoughts?
OK, running tests now. Given a content model for egXML like:
<content>
<alternate minOccurs="0" maxOccurs="unbounded">
<textNode/>
<anyElement except="http://www.tei-c.org/ns/1.0 teix:egXML"
include="http://www.tei-c.org/ns/Examples"/>
</alternate>
</content>
teitorelaxng
will generate a define for the anyElement, which will be referenced in egXML's RNG content model. The define will exclude the TEI namespace and the egXML element (tei:egXML
is "magic", but maybe better than hard-coding the egXML exception). @include
is implemented as a Schematron rule if the anyElement is a descendent of an elementSpec (otherwise I can't see how to get the context).
Would it be less magic if we gave <anyElement> an attribute such as @recursable of datatype teidata.truthValue ? then we could say <anyElement @recursable="false"> means "any element except my parent" . I can't think of any other reason why you;d want to exclude specific elements.
I'm not sure it's an improvement, really. What about this: I can modify the XSLT so that if you've set a namespace prefix on the element|macroSpec ancestor, e.g. xmlns:teix="http://www.tei-c.org/ns/Examples"
, then when it encounters a uri in the form teix:egXML, it'll do the right thing for it. No more magic then.
This doesnt seem to address the issue I thought we were discussing.
Why not? It means you can use a standard reference scheme to exclude a particular element. I get that egXML might be the only case where you'd want to do this, but the problem with an attribute is that I can't figure out how we'd make it work inside a macro (where there's no context to grab the parent's name from). I presume we'll keep macro.anyXML around for at least a while for backwards compatibility, no?
On Thu, Nov 3, 2016 at 12:55 PM, Lou notifications@github.com wrote:
This doesnt seem to address the issue I thought we were discussing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-258204079, or mute the thread https://github.com/notifications/unsubscribe-auth/AABbxaVkRBzyH9UL2QAUusHyRjBPwRPXks5q6hH1gaJpZM4GY6mW .
Youve lost me ompletely now. Maybe it's the heat. What does a "standard refereence scheme" have to do with the ;price of fish?
This way, we can generate the proper <name ns="http://www.tei-c.org/ns/Examples">egXML</name>
in rng:except by putting teix:egXML
(a standard way of referencing a namespaced resource—an element in this case) in the @except
attribute on anyElement, without having to invent anything special to support it—you just have to define that prefix in the usual way on the ancestor macroSpec or elementSpec (or above).
So your proposal is to permit a namespaced element name as one of the possible values for @except? That seems plausible tho I think the spec shd be adjusted to make clear what that means.
If I'm understanding then the magic in this question would be in the processing that if you have:
<elementSpec indent="egXML" mode="change" xmlns:teix="http://www.tei-c.org/ns/Examples">
<!-- .... further down in content ... -->
<anyElement except="teix:egXML" include="http://www.tei-c.org/ns/Examples"/>
<!-- ... -->
</elementSpec>
That the teix: local namespace prefix will magically be known about and used in processing to RNG.
Am I understanding that correctly? If so, that seems sensible enough to me.
We may need to invent a new datatype if @except is to permit as values one or more namespaces or namespace-prefixed-element-name.
I think it is a fairly simple one using xsd:anyURI isn't it? That would allow teix:foo foo http://www.example.com/ns/
does that allow something we don't want then?
@jamescummings that's it. I think it's not magic at all if the prefix is defined. @except
is defined as teidata.namespace, which is xsd:anyURI (see http://teic.github.io/TEI/ref-teidata.namespace.html). I'm not sure we can do better than that.
Do we need a similar solution on the @include
side? I.e. should we allow teix:egXML in @include
? My feeling is no, because that would entail some validation of the content of anyElement, and if you wanted to do that, you ought to be actually validating the content—and there are already good ways to do that.
But the data type is explicitly teidata.namespace, rather than teidata's.pointer for example. So it needs changing unless you think teix:egXML is a valid namespace. The fact that we map to anyURI is not relevant. I am pretty sure that eg #foo is not a valid namespace even though it's also anyURI.
And no we don't want to allow explicit elements on @include thank you. That would be crazy talk.
Fair point. teix:egXML could be a namespace, as could any URI. But it isn't one. Maybe it's not a great datatype, as there's nothing that distinguishes a namespace from any other URI other than how it's used...
Looking at @lb42's proposed content model for content
...I don't think it can actually work that way, unless you mean to ban RelaxNG content models. Was that what you were going for? We could do something like:
<alternate>
<anyElement include="http://relaxng.org/ns/structure/1.0"
except="http://www.tei-c.org/ns/1.0 http://www.tei-c.org/ns/Examples"/>
<classRef minOccurs="0" maxOccurs="unbounded" key="model.contentPart" />
</alternate>
to permit only RelaxNG or Pure content models.
Since <anyElement> is a member of model.contentPart, I don't understand why this alternation is not ambiguous. Also, don't forget that <content> must be allowed to be empty.
As I just posted on Council list, yes, that was what I intended. But to avoid ambiguity, I think we need a different element, i.e. <rngContent>
@lb42 Do you think we should use teidata.pointer instead of teidata.namespace to support URIs like teix:egXML? Or ought we to create a new type? Teidata.uri?
TeiX:egXML is a teidata.name sfaics
Only in the sense that it matches the production for Name. Semantically it's a URI, and teidata.name isn't compatible with URIs
In what sense is teidata.name "incompatible"? It is a particular form of URI which is usable as a tei Name. Yes rthere are other kinds of URI of whjich this is not true, but that's why the datatype is teidata.name rather than teidata.anyuri. I don't see what you;re getting at.
Unless I'm totally confused, teidata.name must be an xsd:Name, and those don't permit ASCII symbols and punctuation marks, crucially including '/'. See https://www.w3.org/TR/REC-xml/#dt-name. so, yeah, you could have teix:egXML, but not http://www.tei-c.org/ns/1.0. I'm leaning towards just defining them as anyURI for now...
xsd:name does permit colons however, which is fine for our purposes. I cannot imagine what it would mean to have a name which used http://whatever. If by "they" you mean @exclude and @require, then I suggest we need a datatype which permits either teidata.name or teidata.namespace , definitely not anyURI. what does require="#wibble" mean?
So create a new teidata.nameOrNamespace type? Unfortunately, require="#wibble" will be 100% legal no matter what we do :-). It means someone, for reasons I prefer not to contemplate, declared xmlns="#wibble" on some of the elements in their schema. Which you're allowed to do, I fear.
Think this is done.
The content for macro.anyXML is at present expressed in RELAXNG . If we want pure ODD to be really pure, we need a new element to express the concept. This would also enable us to add attributes to control which namespaces should be excluded from its content.