TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 84 forks source link

Need for some way to represent macro.anyXML in Pure ODD #1373

Closed lb42 closed 8 years ago

lb42 commented 9 years ago

The content for macro.anyXML is at present expressed in RELAXNG . If we want pure ODD to be really pure, we need a new element to express the concept. This would also enable us to add attributes to control which namespaces should be excluded from its content.

lb42 commented 9 years ago

Looking at our current practice, there seem to be two variants on the way macro.anyXML is used. In one, we permit any XML element from any XML namespace except those specified; in the other, we permit only elements from a specified namespace or namespaces. This suggests that we could implement it in pure ODD as an empty element <anyXML, with two attributes @except and @include. If those attributes are not used, theimplication would be to permit any element from any namespace, which opoens up the DTD duplicate IDs problem again, so I am not sure how that would sensibly be implemented. Maybe we should require the presence of one or other of the two attributes. It would be easy enough to add a new element to the Pure ODD branch along those lines, though implementing the necessary transformation in XSLT might be a tad trickier.

sydb commented 8 years ago

Lou — Not sure I get what you mean, here. We refer to macro.anyXML in 5 places (not counting itself):

At least in the 'dev' branch, it looks like child elements can be from any namespace in all cases.

While it seems to me having @include and @except could be useful features for ODD writers, I don't see where we would use them ourselves. So my instinct is they should not hold up publication of PureODD.

lb42 commented 8 years ago

You may want to refer back to Hugh's mail of 27/10/15 at 22:01 to the council list

(On 07/12/15 16:37, Syd Bauman wrote:

Lou — Not sure I get what you mean, here. We refer to macro.anyXML in 5 places (not counting itself):

  • constraint
  • content
  • egXML
  • macro.schemaPattern
  • xenoData At least in the 'dev' branch, it looks like child elements can be from any namespace in all cases.

While it seems to me having |@include| and |@except| could be useful features for ODD writers, I don't see where we would use them ourselves. So my instinct is they should not hold up publication of PureODD.

— Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162583678.

sydb commented 8 years ago

Sorry, I still don't get it. We always use macro.anyXML without further constraint. (I.e., in none of the 5 cases listed previously do we say “and it should all be in X namespace” or “and it should not be in Y namespace”. All 5 are treated the same: any content from any namespace except neither <teix:egXML> elements nor any elements from the TEI namespace are allowed. And while I still think @include and @except for namespaces of the new <anyXML> is intriguing and probably useful, it is not at all a good solution to the “conflicting ID-types for attribute "id"” problem. It’s just not OK to say “you can define an element to have any content easily, but then you can’t use it to have content from your own namespace”.

sydb commented 8 years ago

(BTW, I don't understand why <teix:egXML> is not allowed in macro.anyXML. Not even sure we need to prevent <teix:egXML> from ever being inside <teix:egXML>, but if we do we should do that somewhere other than the macro for any XML. E.g., a <constraintSpec> in the definition of <teix:egXML>.)

hcayless commented 8 years ago

But macro.anyXML comes with built-in constraints, namely that the TEI and TEI Example namespaces are excluded. If you want to include any other namespaces in your TEI that use xml:id, then you must also exclude them from the content of macro.anyXML. It’s really macro.anyOtherXML :-).

We talked about this extensively at the F2F and the consensus was that excluding namespaces in this way was the least bad solution. Given that, and the fact that you’ll need to be able to customize on the fly if your ODD has extra namespaces that use xml:id, I think Lou’s suggestion has merit.

For our new members, to whom I expect this sounds like utter gibberish. It’s a kind of obscure bug that bit us recently. I attempted to explain it in the email Lou referred to: http://lists.tei-c.org/pipermail/tei-council/2015/022174.html http://lists.tei-c.org/pipermail/tei-council/2015/022174.html

On Dec 7, 2015, at 15:53 , Syd Bauman notifications@github.com wrote:

Sorry, I still don't get it. We always use macro.anyXML without further constraint. (I.e., in none of the 5 cases listed previously do we say “and it should all be in X namespace” or “and it should not be in Y namespace”. All 5 are treated the same: any content from any namespace except neither teix:egXML elements nor any elements from the TEI namespace are allowed. And while I still think @include and @except for namespaces of the new is intriguing and probably useful, it is not at all a good solution to the “conflicting ID-types for attribute "id"” problem. It’s just not OK to say “you can define an element to have any content easily, but then you can’t use it to have content from your own namespace”.

— Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162658639.

sydb commented 8 years ago

With all due respect, Mr. Chairman, that turns out not to be the case (on almost all counts).

the TEI and TEI Example namespaces are excluded.

No, only the TEI namespace is excluded. Not only is the TEI Example namespace permitted, it is what macro.anyXML was originally made for!

If you want to include any other namespaces in your TEI that use xml:id, then you must also exclude them from the content of macro.anyXML.

No, there are other solutions, most notably turn off RELAX NG DTD compatability mode, as it is not intended to be used for this, anyway. (It is intended for when all features of DTDs are being emulated, not just the ones we like :-)

F2F … consensus was that excluding namespaces in this way was the least bad solution

It may have been the majority opinion, but it is hard to call that a consensus when I was squealing as loudly as I could that excluding namespaces is a terrible solution. (And there is a good one — turn off DTD compatability mode and use Schematron to check for ID uniquness.)

I think Lou’s suggestion has merit.

I do too, but not for this.

There is another explanation, complete with a solution that does not tell users they can't use elements from TEI or their own namespace in <anyXML>.

lb42 commented 8 years ago

Yes, the current definition for macro.anyXML excludes the <teix:egXML> element itself, and any element from the TEI namespace. I cannot remember why egXMLs cannot nest, except that it makes my head hurt thinking about how you'd validate that, and my proposed <anyXML> element wouldn't address this issue. It does however address the other requirement rather tidily. As for the alternative solutions proposed: my recollection is that we did discuss both of them at the FTF, and rejected them on the grounds that they introduced too much reliance on schematron, and removed something (ID/IDREF validation) clearly of importance to many TEI users.

hcayless commented 8 years ago

[Editing for clarity, as @sydb is correct]

Syd is right: only the TEI namespace and <egXML> itself in the example namespace are excluded.

It may have been the majority opinion, but it is hard to call that a consensus when I was squealing as loudly as I could that excluding namespaces is a terrible solution. (And there is a good one — turn off DTD compatability mode and use Schematron to check for ID uniquness.)

The unfortunate problem with your solution is that this mode is on by default. So things like Oxygen will tell you your schema is invalid. I’m afraid that makes it a less-good solution. People are going to assume their schema (or the TEI) is wrong and I’m not sure how we could communicate effectively to everyone who uses TEI and RNG that they need to do this whenever they validate a TEI document. I agree it sucks, but a solution that requires all of our users to do something extra isn’t practical.

lb42 commented 8 years ago

Sorry Hugh, I think you are mistaken about what's being excluded by line 20 of the current definition. It's a <name> element, not a <nsName> .. as I noted above. But this doesn't affect the issue about what is the least worst solution to this mess, where I think you are right.

hcayless commented 8 years ago

Ah, fair enough. Wasn’t looking closely enough.

On Dec 7, 2015, at 17:15 , Lou notifications@github.com wrote:

Sorry Hugh, I think you are mistaken about what's being excluded by line 20 of the current definition. It's a element, not a .. as I noted above. But this doesn't affect the issue about what is the least worst solution to this mess, where I think you are right.

— Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-162683923.

lb42 commented 8 years ago

Starting to think about implementing this, and decided it would be better to name the element anyElement. Wondering whether to make a new branch just to add it, or not.

martindholmes commented 8 years ago

anyElement suggests that the content must be a well-formed try with a single element, doesn't it? But doesn't macro.anyXML allow well-formed fragments? Or am I misunderstanding?

hcayless commented 8 years ago

Presumably your content model would be text or anyElement, any number of times, right?

On Wed, Aug 31, 2016 at 1:29 PM, Martin Holmes notifications@github.com wrote:

anyElement suggests that the content must be a well-formed try with a single element, doesn't it? But doesn't macro.anyXML allow well-formed fragments? Or am I misunderstanding?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-243838844, or mute the thread https://github.com/notifications/unsubscribe-auth/AABbxWCUsz69VeqB7F6UC9iWEENq2KVIks5qlbn-gaJpZM4GY6mW .

martindholmes commented 8 years ago

I see; Lou's proposing an element that would form part of the anyXML structure, and would represent in it any XML element at all. He's not proposing a single element which would have the function of macro.anyXML.

lb42 commented 8 years ago

Actually, I'm not sure. ANY (which is what macro.anyXML translates to in DTD land) means as many as you like of textnode or any-element-defined-in-the-dtd* , and that's what the current macro.anyxml tries to reproduce (except that it allows any element at all, modulo namespace constraints of various kinds).

lb42 commented 8 years ago

Proposed changes to implement this are described at http://teic.github.io/TCW/anyXMLproposal.html

hcayless commented 8 years ago

Stylesheets implementation is needed to close this, see TEIC/Stylesheets#170

hcayless commented 8 years ago

I'm working on implementing TEIC/Stylesheets#170 and have a question for @lb42 and any RelaxNG gurus: the @ns attribute on <anyName> doesn't seem to have the effect of requiring content be in that namespace, though it does have the nice side effect of making Oxygen suggest elements from the ns. But unless I'm mistaken @include on <anyElement> is intended to require content be in that namespace. Right?

hcayless commented 8 years ago

Second question: are we keeping macro.anyXML under this new régime? I'm going to look into auto-generating some schematron to handle the @include constraint.

hcayless commented 8 years ago

Think I've got it working. No surprise, it turns out to be more complicated than expected. I've got to flesh it out a little more and then run it through the tests to see what breaks tomorrow, but I think we're ok.

One note for @lb42: we can't ditch <name ns="http://www.tei-c.org/ns/Examples">egXML</name> from <except> because of the xml:id thing. I can make it a default, try to do something smart when it's used in the egXML element, or we can add new syntax for the exclusion of particular elements. Thoughts?

hcayless commented 8 years ago

OK, running tests now. Given a content model for egXML like:

<content>
      <alternate minOccurs="0" maxOccurs="unbounded">
        <textNode/>
        <anyElement except="http://www.tei-c.org/ns/1.0 teix:egXML" 
                    include="http://www.tei-c.org/ns/Examples"/>
      </alternate>
</content>

teitorelaxng will generate a define for the anyElement, which will be referenced in egXML's RNG content model. The define will exclude the TEI namespace and the egXML element (tei:egXML is "magic", but maybe better than hard-coding the egXML exception). @include is implemented as a Schematron rule if the anyElement is a descendent of an elementSpec (otherwise I can't see how to get the context).

lb42 commented 8 years ago

Would it be less magic if we gave <anyElement> an attribute such as @recursable of datatype teidata.truthValue ? then we could say <anyElement @recursable="false"> means "any element except my parent" . I can't think of any other reason why you;d want to exclude specific elements.

hcayless commented 8 years ago

I'm not sure it's an improvement, really. What about this: I can modify the XSLT so that if you've set a namespace prefix on the element|macroSpec ancestor, e.g. xmlns:teix="http://www.tei-c.org/ns/Examples", then when it encounters a uri in the form teix:egXML, it'll do the right thing for it. No more magic then.

lb42 commented 8 years ago

This doesnt seem to address the issue I thought we were discussing.

hcayless commented 8 years ago

Why not? It means you can use a standard reference scheme to exclude a particular element. I get that egXML might be the only case where you'd want to do this, but the problem with an attribute is that I can't figure out how we'd make it work inside a macro (where there's no context to grab the parent's name from). I presume we'll keep macro.anyXML around for at least a while for backwards compatibility, no?

On Thu, Nov 3, 2016 at 12:55 PM, Lou notifications@github.com wrote:

This doesnt seem to address the issue I thought we were discussing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1373#issuecomment-258204079, or mute the thread https://github.com/notifications/unsubscribe-auth/AABbxaVkRBzyH9UL2QAUusHyRjBPwRPXks5q6hH1gaJpZM4GY6mW .

lb42 commented 8 years ago

Youve lost me ompletely now. Maybe it's the heat. What does a "standard refereence scheme" have to do with the ;price of fish?

hcayless commented 8 years ago

This way, we can generate the proper <name ns="http://www.tei-c.org/ns/Examples">egXML</name> in rng:except by putting teix:egXML (a standard way of referencing a namespaced resource—an element in this case) in the @except attribute on anyElement, without having to invent anything special to support it—you just have to define that prefix in the usual way on the ancestor macroSpec or elementSpec (or above).

lb42 commented 8 years ago

So your proposal is to permit a namespaced element name as one of the possible values for @except? That seems plausible tho I think the spec shd be adjusted to make clear what that means.

jamescummings commented 8 years ago

If I'm understanding then the magic in this question would be in the processing that if you have:

<elementSpec indent="egXML" mode="change" xmlns:teix="http://www.tei-c.org/ns/Examples">
<!-- .... further down in content ... -->

<anyElement except="teix:egXML" include="http://www.tei-c.org/ns/Examples"/>
<!-- ... -->
</elementSpec>

That the teix: local namespace prefix will magically be known about and used in processing to RNG.

Am I understanding that correctly? If so, that seems sensible enough to me.

lb42 commented 8 years ago

We may need to invent a new datatype if @except is to permit as values one or more namespaces or namespace-prefixed-element-name.

jamescummings commented 8 years ago

I think it is a fairly simple one using xsd:anyURI isn't it? That would allow teix:foo foo http://www.example.com/ns/

does that allow something we don't want then?

hcayless commented 8 years ago

@jamescummings that's it. I think it's not magic at all if the prefix is defined. @except is defined as teidata.namespace, which is xsd:anyURI (see http://teic.github.io/TEI/ref-teidata.namespace.html). I'm not sure we can do better than that.

hcayless commented 8 years ago

Do we need a similar solution on the @include side? I.e. should we allow teix:egXML in @include? My feeling is no, because that would entail some validation of the content of anyElement, and if you wanted to do that, you ought to be actually validating the content—and there are already good ways to do that.

lb42 commented 8 years ago

But the data type is explicitly teidata.namespace, rather than teidata's.pointer for example. So it needs changing unless you think teix:egXML is a valid namespace. The fact that we map to anyURI is not relevant. I am pretty sure that eg #foo is not a valid namespace even though it's also anyURI.

lb42 commented 8 years ago

And no we don't want to allow explicit elements on @include thank you. That would be crazy talk.

hcayless commented 8 years ago

Fair point. teix:egXML could be a namespace, as could any URI. But it isn't one. Maybe it's not a great datatype, as there's nothing that distinguishes a namespace from any other URI other than how it's used...

hcayless commented 8 years ago

Looking at @lb42's proposed content model for content...I don't think it can actually work that way, unless you mean to ban RelaxNG content models. Was that what you were going for? We could do something like:

<alternate>  
  <anyElement include="http://relaxng.org/ns/structure/1.0" 
          except="http://www.tei-c.org/ns/1.0 http://www.tei-c.org/ns/Examples"/>
  <classRef minOccurs="0" maxOccurs="unbounded" key="model.contentPart" />
</alternate>

to permit only RelaxNG or Pure content models.

lb42 commented 8 years ago

Since <anyElement> is a member of model.contentPart, I don't understand why this alternation is not ambiguous. Also, don't forget that <content> must be allowed to be empty.

lb42 commented 8 years ago

As I just posted on Council list, yes, that was what I intended. But to avoid ambiguity, I think we need a different element, i.e. <rngContent>

hcayless commented 8 years ago

@lb42 Do you think we should use teidata.pointer instead of teidata.namespace to support URIs like teix:egXML? Or ought we to create a new type? Teidata.uri?

lb42 commented 8 years ago

TeiX:egXML is a teidata.name sfaics

hcayless commented 8 years ago

Only in the sense that it matches the production for Name. Semantically it's a URI, and teidata.name isn't compatible with URIs

lb42 commented 8 years ago

In what sense is teidata.name "incompatible"? It is a particular form of URI which is usable as a tei Name. Yes rthere are other kinds of URI of whjich this is not true, but that's why the datatype is teidata.name rather than teidata.anyuri. I don't see what you;re getting at.

hcayless commented 8 years ago

Unless I'm totally confused, teidata.name must be an xsd:Name, and those don't permit ASCII symbols and punctuation marks, crucially including '/'. See https://www.w3.org/TR/REC-xml/#dt-name. so, yeah, you could have teix:egXML, but not http://www.tei-c.org/ns/1.0. I'm leaning towards just defining them as anyURI for now...

lb42 commented 8 years ago

xsd:name does permit colons however, which is fine for our purposes. I cannot imagine what it would mean to have a name which used http://whatever. If by "they" you mean @exclude and @require, then I suggest we need a datatype which permits either teidata.name or teidata.namespace , definitely not anyURI. what does require="#wibble" mean?

hcayless commented 8 years ago

So create a new teidata.nameOrNamespace type? Unfortunately, require="#wibble" will be 100% legal no matter what we do :-). It means someone, for reasons I prefer not to contemplate, declared xmlns="#wibble" on some of the elements in their schema. Which you're allowed to do, I fear.

hcayless commented 8 years ago

Think this is done.