TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
278 stars 88 forks source link

Domain of concept of cleanliness #1586

Closed cmsmcq closed 3 years ago

cmsmcq commented 7 years ago

In 23.3.1 Kinds of Modification, the text says "Cleanliness can only be assessed with reference to elements in the TEI namespace."

It's not clear to me whether this means

(a) Only elements in the TEI namespace can be said to be cleanly or uncleanly modified, in a particular customization of the TEI schema; the concept does not apply to elements in other namespaces.

or

(b) In asessing the cleanliness or uncleanliness of a given customization of the TEI schema, only elements in the TEI namespace are considered. In effect this means that before considering whether a given document is valid against the base schema, the modified schema, or both, all elements in non-TEI namespaces must be deleted.

Neither of these appears satisfactory. (a) appears to make the property of cleanliness apply to elements in the TEI namespace, instead of to schemas, but the text of the section defines cleanliness only for schemas (= sets of modifications), not for elements individually. So (a) would entail an appeal to an undefined concept.
(b) appears to render the entire concept of cleanliness both more complex than it would otherwise be, and less informative.

I'd suggest deleting the sentence entirely, only it seems likely that it is trying to say something meaningful, and deletion might lead to damage elsewhere. I'd suggest alternative wording, only I don't know what it's trying to say.

lb42 commented 7 years ago

It is trying to say (b)

cmsmcq commented 7 years ago

Thank you. If it's trying to say (b), then I think there is a documentation problem (it's not succeeding very well, and (b) contradicts what is said in the actual definition of 'clean modification') and there may be a design problem (it is not clear to this reader what could motivate such a definition of cleanliness, or such a shift in the definition from that given in P3).

cmsmcq commented 7 years ago

Note that if (b) is adopted as a rule, then it may have some unexpected consequences.

1 Consider an unclean modification M which abuses various elements which can only occur within the body (not in the header) by redefining them in ways that change their syntax radically. By hypothesis, M is unclean.

Now consider modification M', which includes all the changes of M, plus one which replaces the body element with my:newBody. If (b) applies, then the my:newBody element is deleted when considering cleanliness, so none of the abused elements is presented for validation, and the modification qualifies as clean. (I'm relying here on Lou's remark that deleting the body element no longer causes invalidity.)

Hiding unclean changes to elements in the TEI namespace by wrapping them in a non-TEI element should perhaps not be a way to make them clean.

2 If a set of modifications renames a mandatory element (e.g. titleStmt) and puts it in a different namespace, but makes no further change, then no element valid against the modification will, after deletion of non-TEI elements (here: my:titleStmt), be valid. So is renaming an element to a non-TEI namespace clean if the element is optional, and unclean if it's mandatory?

I'll stop for now.

lb42 commented 7 years ago

But surely the unclean modifications constituting M must also be in a non tei namespace already?

cmsmcq commented 7 years ago

Why? Is it not possible to make unclean modifications by redefining, say, tei:list to contain a sequence of tei:div elements? If I have understood you correctly, it may be rather difficult to make unclean modifications using elements from non-TEI namespaces. But the premise of the example is that there are unclean modifications involving elements which can validly occur in the body but not the header (so: not tei:list). It doesn't matter for the example whether they are in the TEI namespace or not.

lb42 commented 7 years ago

Um, it seems to me that if you want a "list" which contains a sequence of tei:divs, that's not a tei:list, that's someWeirdShit:list ... maybe I am being too draconian here, but I think any uncleanly modified TEI element has to be in a non-TEI namespace. Otherwise we're all doomed.

cmsmcq commented 7 years ago

How good or bad, motivated or unmotivated the unclean changes are is beside the point. The point of the example is the observation that on interpretation (b), a set of unclean changes can become clean by renaming an ancestor.

Perhaps my describing the case so abstractly is causing trouble. Let me take a moment to find a way to make it more concrete. I think I need an element which is legal only in the text, and not in the header ... OK. I see that text is no longer legal as a child of p (how do I encode an embedded haiku in this paragraph, then, if I suddenly am moved to say

Summer grasses --
all that's left
of warriors' dreams?

in the middle of this discussion? Ah, well), so text won't be allowed to occur inside a teiHeader. And since div can only occur inside of text but not inside of teiHeader. I'll use div. Get some coffee; take a seat. This will take a few minutes.

Customizations T and M

Let T (for unmodified TEI) be the TEI customization consisting of the following modification of TEI:

Let M be the following modification:

(You will perhaps object that modification M is unmotivated; I don't care. The definition of 'clean modification' says nothing about psychological states.)

(You will perhaps object -- indeed, you already have -- that tei:div ought to be placed in a different namespace, given that it now appears to have nothing to do with tei:div as the Guidelines define it. I don't care. I think your claim amounts to saying that modifications ought to be clean. I'm concerned about the definition of clean and unclean here, not about good practice in TEI customization. If the Guidelines don't have coherent definitions of concepts, it will be hard to say coherently what kinds of modifications ought and ought not to be made. [I'm also a bit surprised by the shift of the term 'clean' from a technical term meaning, essentially 'simply describable', to what looks very much like a term of approbation. I'm not sure I think it is a good idea.])

Some basic terms: S(X), L(X), purify(D), L'(X)

For any customization or modification X, let S(X) be the schema corresponding to this modification (or the schema generated by applying X to the TEI schema, if one prefers that form of words).

Let L(X), the language defined by X, be the set of documents valid against S(X). (So L(T) is the set of TEI documents using only elements from the four basic modules and iso-fs, and L(M) is the set of documents accepted by modification M.)

For any XML document D rooted in a TEI element, let purify(D) be the document resulting from deleting all non-TEI elements in D. (This is not at all a good formal definition, but I trust it's clear enough for present purposes.) For any XML document D rooted in a non-TEI element, let purify(D) be undefined. (The result of deleting the root element of an XML document is not an XML document.)

Let L'(X), the purified language defined by X, be the smallest set of documents such that for all D in L(X), purify(D) is in L'(X) if purify(D) is defined.

I note in passing that unless there is a wildcard allowing non-TEI elements somewhere in the TEI schema, then for modification M described above L'(M) will be the same as L(M), since M doesn't add any non-TEI elements to strip.

Restatement of interpretation (b)

Interpretation (b) appears to say that any modification X is a clean modification of TEI if and only if L'(X) is a subset of L(Y), where Y is a customization which selects the same modules as X and makes no changes.

Uncleanliness of modification M

I believe that it's clear that L(M) includes at least one document D with a tei:div element containing a single tei:equipment element as child. I believe it's also clear that D is not in L(T). So the set of documents accepted as valid by M (i.e. L(M)) is not a subset of the documents accepted as valid by a schema consisting of the four basic modules. The definition of clean modification is, as you will recall:

We use the term clean modification to describe a modification which regards as valid a subset of the documents considered valid by the same combination of TEI modules unmodified.

So: M is not a clean modification. It is an unclean modification. (That is, it's unclean if the use of disjoint in the current definition of unclean modifications can be safely ignored.)

Modification M'

Now consider modification M'.

Cleanliness of modification M'

L(M') includes at least one document D with a tei:div element containing a single tei:equipment element as child. Since in unmodified TEI (and therefore also in S(T)), tei:div cannot contain tei:equipment as a child, document D is not in L(T). The definition of clean modifications quoted above seems at first glance therefore to exclude M' from the set of clean modifications.

But interpretation (b) says that for cleanliness of M it's not L(M') but L'(M') that counts. In D, the only place tei:div can occur is within whazzat:thud. In purify(D), the whazzat:thud element is stripped out. So purify(D) contains no tei:div elements. It contains a tei:TEI element whose children are a tei:teiHeader, and a tei:fsdDecl. Both of these will be valid against S(T), since M' makes no changes to any elements that can occur within teiHeader or fsdDecl. Since the sequence of tei:teiHeader followed by tei:fsdDecl is accepted by the content model of tei:TEI, D as a whole will be valid against S(T), and thus in L(T). The same argument applies to every document in L(M') containing a tei:div element: the stripped verion of that document will contain no tei:div elements and will be valid against S(T). L(M') will also contain documents which contain no tei:div elements; I think it should be evident that these too will be valid against S(T). Since every member of L(M') either contains at least one tei:div element or contains no tei:div elements, we have established that every member of L(M') is valid against S(T). It follows that L(M') is a subset of L(T).

By interpretation (b), therefore, M' is a clean modification, despite its redefinition of tei:div.

The broad summary is thus:

It was this pair of sentences that I had in mind when I described case 1 above of interpretation (b) possibly having unexpected consequences.

[Typo corrected 18 Feb; s/strip(D)/purify(D)/]

lb42 commented 7 years ago

Encoding the haiku is easy : use <floatingText> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-floatingText.html I'll come back to the rest later, unless someone else wants to dive in.

lb42 commented 7 years ago

Rereading this, I feel that a key point is your assertion that "Hiding unclean changes to elements in the TEI namespace by wrapping them in a non-TEI element should perhaps not be a way to make them clean." But I see no way round this. We can't legislate for non-TEI elements: hence we can't stop them abusing any TEI children they may have. We can be sniffy about it, I suppose. Are you suggesting that our current definition of clean/unclean modification can be applied to modifications which result in nonTEI elements? I don't see how. If we define "clean" as meaning something like "restriction of the existing TEI model", then any nonTEI element is irredeemably unclean. This seems too obvious, so I must be missing something.

cmsmcq commented 7 years ago

On Mar 23, 2017, at 3:33 PM, Lou notifications@github.com wrote:

Rereading this, I feel that a key point is your assertion that "Hiding unclean changes to elements in the TEI namespace by wrapping them in a non-TEI element should perhaps not be a way to make them clean." But I see no way round this. We can't legislate for non-TEI elements:

True. Or rather, not true — you can legislate for anything you like, for those who wish to listen. What is apparently true is that you don’t wish to do so, and I agree that it’s desirable to leave a good deal of freedom to those who define elements in other namespaces intended to intermingle with TEI elements. I don’t agree, however, that it’s undesirable to impose any constraints at all on their use of TEI elements, and I don’t agree that it’s impossible to impose constraints on their use of TEI elements.

But note that we are talking about a technical term here, and not about TEI conformance: no legislating is involved, regardless. I don’t agree that it’s undesirable to define a term which distinguishes non-TEI elements which use TEI elements in one way from non-TEI elements which use TEI elements in another way. (You can supply “abusive” and “non-abusive” if you like, but the principle is broader.) And it’s definitely not impossible.

hence we can't stop them abusing any TEI children they may have.

Why not?

Of course you cannot stop them in a literal sense, unless the TEI finally starts getting serious about this invisible government business and acquires methods for punishing people who do things the TEI thinks they should not do. Get with the program! What good is it being an invisible government if you can’t reward your friends and punish your enemies?!

I suspect you mean “we cannot define TEI conformance (or cleanliness) to exclude the possibility that TEI children of a non-TEI element are abused”.

But if you can define “abuse” coherently (a condition I am not convinced you can currently meet), why should you not be able to say “conforming (or: clean) TEI documents MUST refrain from abuse of TEI elements”?

I can think of two rationales for not wishing to define the concept (cleanness, conformance, whatever) to exclude abuse within non-TEI elements:

Neither is compelling, I think.

We can be sniffy about it, I suppose. Are you suggesting that our current definition of clean/unclean modification can be applied to modifications which result in nonTEI elements? I don't see how.

Short answer: yes, I am suggesting something very like that (though not in those words), and please remember that not knowing how to do something is not the same as knowing that it cannot be done.

Longer answer: There are several things apparently going wrong in these sentences.

On the face of it, it seems obvious that the current definitions of clean and unclean modifications can be applied to modifications which define (or result in, if that means what I think it means) non-TEI elements — if you can’t apply the definition to such a modification, how can you possibly tell whether the modification fits the definition or not? is clean or unclean or (in the current state of P5, counter-intuitively) neither?

But I think you mean something else. I think you’re asking if the concept of clean modification can be formulated to constrain changes to TEI elements even when they have a non-TEI ancestor. That won’t be the current definition of clean modification — since it excludes non-TEI elements entirely. it may or may not be a definition that comes close to matching what you (LB) and others mean when you use the terms “clean” and “unclean”, but since I have spent several weeks trying without success to understand what you and others mean by the terms, I don’t want to speculate.

Do I believe that it’s possible to define properties of uses or customizations of a vocabulary that apply to uses of that vocabulary even within parents from another namespace? Sure. Lots of specs define elements intended for embedding within other vocabularies, from the CALS and SGML Open table models down to MathML, ARIA, and ITS. Fewer define elements intended both to be embedded in elements from other namespaces, and to contain elements from other namespaces (and some WGs clearly start to experience vertigo at this point, and start talking gibberish), but there are examples: the SGML Open table model does it nicely, as does SVG. Or for that matter XSLT: the rules for use (and validity) of an XSL element don’t cease to apply just because that element is embedded within a literal result element in another namespace.

If we define "clean" as meaning something like "restriction of the existing TEI model", then any nonTEI element is irredeemably unclean.

Well, that is how the term is currently defined. It’s not the way you use it, but it is what the definition actually present in P5 actually means in English.

You may not want it to mean that, in which case you might wish to redefine the term. But I think that you are in any case falling prey to the value judgement clean = good, unclean = bad which various comments on this issue have already deplored.

This seems too obvious, so I must be missing something.

Well, at least one of us is missing something. I am sorry that this is proving so difficult — if I understood where the problem was, I would try to fix it. It’s tiring to be trying to fix it without knowing where the problem is.

lb42 commented 7 years ago

Do we have agreement that "the concept currently known as clean" == "restriction on current TEI syntactic and semantic model as expressed by the TEI schema and associated schematron rules" ? That is at least clear. We can then think up names for it, and for the associated also-rans.

martindholmes commented 7 years ago

I'm OK with that, but for me it's identical with "conformant". :-)

cmsmcq commented 7 years ago

On Mar 27, 2017, at 9:14 AM, Martin Holmes notifications@github.com wrote:

I'm OK with that, but for me it's identical with "conformant". :-)

Is that a statement of what you understand the current text to mean? Or a statement of what you think the current text should be changed to say?

In the former case, how do you get from the current text to this interpretation?

In the latter case, are there no concerns about backwards compatibility for projects which have in good faith undertaken to produce TEI-conformant documents, on the understanding that TEI is designed to be extended and that extensions are (if properly declared and documented) conforming TEI?


C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com


martindholmes commented 7 years ago

Sorry for the vagueness. What I mean is: I believe "conformance" should mean "being against [version x of] tei_all.rng, including its Schematron rules", and I believe that a "clean" customization should generate a schema which validates only documents which form a subset of those which validate against tei_all. In that sense, a "clean" customization is a "conformant" customization. I don't like the term "clean" because there's an implied judgement in its semantics. I would make a contrast between a "contraction" of the P5 (only additional constraints and the removal of non-essential things) and an "extension" (involving the addition of new things, or loosening of existing constraints).

cmsmcq commented 7 years ago

On Mar 27, 2017, at 12:32 PM, Martin Holmes notifications@github.com wrote:

... I would make a contrast between a "contraction" of the P5 (only additional constraints and the removal of non-essential things) and an "extension" (involving the addition of new things, or loosening of existing constraints).

Such a distinction is almost certain to be worthwhile. But why do you propose to deny the term “conformance” to extensions of TEI?

As the technical person in a project committed to producing TEI-conformant documents, I would object strenuously to such a change. It would pull the rug out from under us with our funding agency in a very unpleasant way.


C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com


cmsmcq commented 7 years ago

On Mar 27, 2017, at 12:32 PM, Martin Holmes notifications@github.com wrote:

Sorry for the vagueness. What I mean is: I believe "conformance" should mean "being against [version x of] tei_all.rng, including its Schematron rules”,

In terms of the question I asked, think that this means you do not believe the current definiton of conformance has this meaning, and that you believe the definition of conformance should be changed so that it acquires the meaning you give. (I am assuming the word ‘valid’ between ‘being’ and ‘against’.)

That answer leaves the follow-on question unanswered: does it not bother you to change the definition of conformance in such a way as to change current projects which extend TEI from “conforming” to “non-conforming” uses of TEI? Is backward compatibility not an issue?


C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com


martindholmes commented 7 years ago

I don't believe we've ever had a clear definition of conformance in a formal sense. But if you prove me wrong, and demonstrate that we have clearly said that documents which do not validate against tei_all are "conformant", then I would change my objection to say that in that case the word "conformant" is not useful (although anyone is of course then welcome to use it for grant-getting box-ticking brownie-point purposes), and we need another simple terminology to distinguish between schemas that validate only documents that also validate against tei_all and schemas that are more permissive or include supplementary features.

cmsmcq commented 7 years ago

On Mar 27, 2017, at 3:19 PM, Martin Holmes notifications@github.com wrote:

I don't believe we've ever had a clear definition of conformance in a formal sense. But if you prove me wrong, and demonstrate that we have clearly said that documents which do not validate against tei_all are "conformant", then I would change my objection to say that in that case the word "conformant" is not useful (although anyone is of course then welcome to use it for grant-getting box-ticking brownie-point purposes), and we need another simple terminology to distinguish between schemas that validate only documents that also validate against tei_all and schemas that are more permissive or include supplementary features.

Thank you, but I decline your kind invitation to place the burden of proof on my own back. You can put it on yours, if you like.

I don't want to undertake any proofs at all about the current definition of conformance in P5, and who would take such proofs seriously if I did? As this issue and some of the others I raised about the same time show, I do not understand or claim to understand the current text of the TEI Guidelines on the topic, a bad foundation on which to build proofs. I have been hoping for some time that someone would either explain what they think the text means, or explain what they think the text was intended to mean (and what error causes it not to express that meaning clearly), or say explicitly that they think the text is so broken that it cannot be interpreted. I am grateful to you for being the first participant in the discussion willing to take a public stand on one of these possibilities (even though I think in most standards organizations it is thought to be better if declarations for the third position are accompanied by an argument tending to show that the status-quo text is irreparable).

Before reading these chapters recently, my most recent encounter with the definition of TEI conformance was in the early 1990s. In TEI P3, I think the intent of the discussion of conformance is clearly to allow extension of the tag set. The definition in P3 of ‘clean modifications’ encompasses both subsets and supersets of the TEI scheme, and neither form of cleanliness is demanded of conforming uses of TEI. Rereading the conformance sections of P3 at the distance of a couple of decades, I see a number of things I wish had been clearer and better worked out, and so I won’t undertake to show that in P3 the TEI had a “clear” statement about conformance in any sense, still less a “formal” one. Indeed, because there were fears that the notion of TEI conformance could easily be misused in ways that would harm scholarship, there was no particular interest (at least on my part) in making the definition of conformance clear or easy to follow: we (or at least I) wanted it to be as hard as I could make it to prove a particular use of TEI non-conforming. And I wanted the definition of conformance to be as capacious as possible, even though that would make it harder for developers of software to claim legitimately that they supported all varieties of TEI documents, and would make the term "TEI conformant" into a not very useful term.

I believe that at the time, I summarized the essential requirements of conformance roughly as follows:

A set of changes that gutted the TEI DTD and replaced it with the document grammar of ISO 12083, or a document grammar modeled on LaTeX, or (pick your favorite unlikely example) would be perfectly legitimate by these rules, as long as (a) the correct modification mechanisms were used (so it would be easy to see that the elements deleted and/or redefined included TEI.2, teiHeader, text, …) and (b) all the new elements and attributes were documented in a TSD. This didn’t come up very often, and I didn’t volunteer this information very often because I thought it would alarm some people who would require time-consuming explanations. But I do have a clear recollection that a corpus linguistic project asked for a way in which they could declare a document encoded in LaTeX as a legal customization of TEI, without translating it into SGML; Lou and I replied firmly that there needed to be a document element containing it, and a header with a title, and otherwise it would not be TEI conformant. (My recollection is that they were disappointed that the standard of TEI conformance was set so unreasonably high.)

Since I haven’t taken the trouble to find any document from the early 1990s to document this summary, it is of course possible that I have projected later views back into the early 1990s. But (brief pause to blow dust off some stacks of old offprints) I think LLC 6.1 (1991): 34-46 is reasonably clear that the expectation at that time was to allow extensions: "In order to be generally useful a markup language must be extensible" (in italics), p. 36. And CHum 29 (1995): 17-39 describes "easy introduction of user-specified extensions to the DTD" as a technical problem whose solution in P3 is outlined. So no, I don't think I'm projecting later views back onto TEI P3.

There is no particular reason that the TEI Consortium should be bound, now, by the design goals that guided the development of TEI P1, P2, and P3. But I think the record as a whole is reasonably clear that extension mechanisms of TEI P3 were intended for use by conforming documents and/or projects.

For that matter, I think the same is true of P5. The ODD vocabulary of P5 provides hooks for adding new elements and attributes, and the primary point of using ODD is (as far as I can see) to provide a conforming definition of a TEI customization. What document designer in their right mind would bother using ODD to define a document grammar as a set of selections from and changes to TEI, if they weren't striving for TEI conformance?

As for customizations that accept only a subset of documents accepted by TEI All, I think formal language theory has already provided a useful term: subset.

cmsmcq commented 7 years ago

On whether extensions were clearly intended to be conformant in P3, two more examples occur to me. The editors of the TEI Guidelines developed TEI Lite as an example of a conforming customization of TEI; it includes a number of extensions vis-a-vis P3. And TEI P3 itself used not vanilla P3, but a customization of P3. (It seems clear in retrospect that the two editors had different mental models for explaining this; my view was that it was important to demonstrate that extending the Guidelines with new elements was a normal and acceptable thing to do, and that we could do that by making P3 itself extend the Guidelines.)

martindholmes commented 7 years ago

The reason I think that a notion of conformance that allows extension is problematic is simply this:

You can use an ODD file to delete all the existing TEI elements and attributes and construct an entirely new set in a different namespace. If you do this, nothing of TEI will remain. Nevertheless, such a schema would have been generated using a TEI ODD file and built against P5. How could it be conformant?

At the other end of the scale, you can leave all of P5 as it is, and add a single new element in your own namespace. This is obviously much closer to P5.

My problem is that I don't see a way to draw a line between the one extreme and the other, except for in a completely arbitrary way -- how much of P5 has to still be there? how many new things are you allowed to add? A notion of conformance that says you can do anything you like as long as you start with an ODD file seems unhelpful to me.

So I fall back to the notion that conformance = validity against tei_all, because that's practical, testable and useful. I'm not making any value judgements about whether extensions are good or bad. We do have a mechanism for building in processing models such that an extension could provide in its own ODD file a processable transformation which would convert it to pure TEI; perhaps that's conformant, although I'd rather judge the output to be conformant than the input. I prefer a model which says:

I don't see why you couldn't have multiple schemas against which conformance could be tested, such that anything that validated against one of them would be conformant; if P3's TEI Lite included extensions that were not part of the main P3, then that situation apparently pertained back then. But I'd still be in favour of there being a single encompassing schema to bind them all; and that's what we have in tei_all, really. Nobody should use it unmodified, but everybody should be able to validate against it.

But this is a discussion about what we think conformance means, or what we think it should mean. I'm just giving my opinion, and you're welcome to reject it completely of course. One reason we're free to differ is that the Guidelines do not contain a helpful explanation or definition; I hope a resolution to that will emerge out of the discussions, and I'll happily bow to it when it comes.

lb42 commented 7 years ago

Being valid against TEI All might be a necessary condition, but it's not my view a sufficient one. A document in which the <l> element (say) is used to mark up typographic lines instead of verse lines would be valid against TEI All. A document in which the <body> contains a single <p> containing the whole of a Project Gutenberg plain text would be valid against TEI All.

martindholmes commented 7 years ago

@lb42 That's exactly the point I made on the Council list:

http://lists.tei-c.org/pipermail/tei-council/2017/024178.html

We need another word for a sort of semantic conformance which can't be mechanically checked.

hcayless commented 7 years ago

There are a bunch of different threads here, and I'm going to try to unpack them:

1) Martin and I are both worried about a standard for "conformance" that isn't machine-checkable. 2) Michael is worried that perfectly reasonable extensions to TEI will be disallowed by a restrictive definition of conformance. If I'm understanding correctly, he actually has a project that needs to be able to say it's doing TEI, but which is also extending TEI. 3) I think we'd all agree there's nothing wrong with extending TEI. We want people to do it. We don't want to prevent people from saying they're using TEI just because they're also going beyond what TEI does. 4) There's more than one sort of conformance, and schema validation can only check structural conformance, not semantic conformance. 5) We can imagine, and there already exist in the world, formats that are semantically conformant with TEI, but are not at all valid TEI. 6) This all rather hijacks the original point of the ticket, which was about what "clean" means in reference to modifications.

if (4) is true, then there's no possibility of a strictly machine-checkable standard for conformance. So either we should stop worrying so much about (1) or we should give up on the idea of conformance. Elsewhere I've argued that perhaps "compatibility" is a better standard (i.e. you can prove compatibility by producing valid TEI as one output). So possibly we shouldn't be talking about conformance at all. BUT, this may be trumped by the needs of our users to be able to legitimately say they're doing things the TEI way, so that their funders will not get annoyed with them. This seems like an overriding concern to me. I'm also increasingly concerned about (5), and I very much want any definition of conformance not to exclude it.

So where are we? The target I would like to hit is something that makes it clear what TEI cares about but encourages experimentation and extension and does not pull any rugs out from under anyone.

cmsmcq commented 7 years ago

On Mar 27, 2017, at 6:08 PM, Martin Holmes notifications@github.com wrote:

… You can use an ODD file to delete all the existing TEI elements and attributes and construct an entirely new set in a different namespace. If you do this, nothing of TEI will remain. Nevertheless, such a schema would have been generated using a TEI ODD file and built against P5. How could it be conformant?

I’m not sure I understand the question. Would an answer involve a lesson in exegesis? a discussion of techniques for drafting conformance clauses in specs so that they say what one wants them to say? an explanation of the rationale for the conformance rules of TEI P3?

martindholmes commented 7 years ago

@cmsmcq I guess the answer would involve your explaining your own proposal for a definition of conformance that is machine-testable, allows extension, but at the same time would disallow (as I presume we'd want to do) a "customization" which contains no TEI elements or attributes at all. I don't know how such a definition could work.

cmsmcq commented 7 years ago

On Mar 28, 2017, at 10:04 AM, Martin Holmes notifications@github.com wrote:

@cmsmcq I guess the answer would involve your explaining your own proposal for a definition of conformance that is machine-testable, allows extension, but at the same time would disallow (as I presume we'd want to do) a "customization" which contains no TEI elements or attributes at all. I don't know how such a definition could work.

Designing a concept of conformance and writing up a description of it seems a steep penalty to pay for having asked a question about what a sentence in P5 is intended to mean — especially given that on the face of it the sentence does not say anything about conformance.

If you and others are in fact interested in my views, I could perhaps be persuaded to think about it and try to describe what I think is possible, but my first instinct is to doubt that my views are or should be of any interest to those now responsible for the TEI. (And in fact the phrasing of your comment makes me think you are just saying, in different words, that you do not think a definition of conformance that satisfies the constraints you specify is possible, and you dare anyone to prove you wrong. Pass.)

martindholmes commented 7 years ago

@cmsmcq

my first instinct is to doubt that my views are or should be of any interest to those now responsible for the TEI

The scale of traffic on the tickets you've raised surely shows that the reverse is the case.

those now responsible for the TEI

The TEI is a community-maintained standard, and I'm no more responsible for it than you are. People currently elected to the Council or the Board (I'm not one of them) have more responsibility for taking action, but actions typically arise out of debate and consultation with the community of users, which includes both of us.

jamescummings commented 7 years ago

I can't say that I've internalised all of this discussion, but going back to the initial issue and the first few responses, I would agree that 'b' was what was intended by that phrase "Cleanliness can only be assessed with reference to elements in the TEI namespace." That is I agree that "In asessing the cleanliness or uncleanliness of a given customization of the TEI schema, only elements in the TEI namespace are considered"... however I do not agree that to test this "all elements in non-TEI namespaces must be deleted." I would assume that it would be satisfactory for TEI checking merely to ignore them (i.e. just allow elements in any other namespace). I don't see how the TEI can be responsible for checking the validity of use of non-TEI elements.

I like @hcayless's summary at https://github.com/TEIC/TEI/issues/1586#issuecomment-289770709 and would reiterate that the most common use of the notion of TEI Conformance in my experience has been in funding applications where someone wants to indicate that they are using the TEI (and usually meaning nothing but a pure subset of tei_all). I agree we want to encourage users to extend the TEI. I'm worried by the notion of 'compatibility' since that is like that old chestnut of 'Conformable' that I don't think we should reinvent. Can you show me a structurally encoded information system that is not 'compatible' through a conversion or export process? (I'm sure we could write one if we really really wanted to...)

All in all, I'd rather keep the notion of TEI Conformance fairly vague, and maybe delete the notion of 'Cleanliness'. (Though I use it all the time in saying 'Pure' subset I notice, by which I mean one whose valid documents would also validate against tei_all.)

hcayless commented 7 years ago

My point about "compatibility" is not meant to be theoretical at all:

you can prove compatibility by producing valid TEI as one output

That is, if you want to say you're compatible with TEI, then TEI should be one of your outputs. Yes, theoretically pretty much any data model could be crosswalked to TEI, and that's no more helpful than conformability. The question is not "could you do it?" but "are you doing it?". I think that's the only way such a notion would be helpful. (And it would make for more TEI in the world.)

cmsmcq commented 7 years ago

On Apr 4, 2017, at 9:51 AM, James Cummings notifications@github.com wrote:

I can't say that I've internalised all of this discussion, but going back to the initial issue and the first few responses, I would agree that 'b' was what was intended by that phrase "Cleanliness can only be assessed with reference to elements in the TEI namespace." That is I agree that "In asessing the cleanliness or uncleanliness of a given customization of the TEI schema, only elements in the TEI namespace are considered"…

Can you explain what “consider” means in this context?

I understand how to validate documents against schemas written in DTD, Relax NG, or XSD.

By contrast, I don’t know how to validate an input document against a schema while ignoring parts of the input document. (Other than in elements which match a ‘skip’ wildcard in XSD.)

How does the process you have in mind differ from validation against a schema? How does it work?

Is the normal theory of automata and formal languages relevant here, or is TEI inventing its own automata theory here?

however I do not agree that to test this "all elements in non-TEI namespaces must be deleted." I would assume that it would be satisfactory for TEI checking merely to ignore them (i.e. just allow elements in any other namespace).

It sounds as though what you want is to validate the document against a schema we might call TEI All Plus, which interleaves each content model of TEI All with a content model consisting of zero or more elements which are not in the TEI namespace. Is that roughly what you have in mind? If not, how does it differ? If so, why do you think it has a different effect from deleting all non-TEI elements in the input? (I see some ways in which it’s different, but I do not know which of them matter to you.)

How do you reconcile your view that assessing ‘cleanliness’ requires ignoring non-TEI elements with the fact that the definition of clean modification doesn’t mention anything about namespace usage? I don’t see anything at all in the definition of ‘clean modification’ that licenses the claim that it can be assessed only with respect to elements in the TEI namespace.

I don't see how the TEI can be responsible for checking the validity of use of non-TEI elements.

Several people have expressed variants of this utterance, which I continue to find bizarre and unhelpful.

What does this mean? What kind of “responsibility” are you reluctant to assume here, and who is trying to impose it on you? In what way would taking seriously the existing wording of the definition of the technical term 'clean modification’ involve taking “responsibility” for anything?

All in all, I'd rather keep the notion of TEI Conformance fairly vague,

The instinct to keep rules of conformance vague, because one doesn’t want people to take them too seriously or care too much about them, and one doesn't want to spend a lot of time on them, appears to be fairly widespread; I have seen it in a wide variety of contexts (sometimes involving conformance, sometimes other topics).

It has never, in my experience, worked to minimize the time spent on the unwelcome topic. The usual effect is to make things worse and to produce longer and more difficult discussions of the topic.

Sometimes the desire for vagueness comes from conflicting desires; shall we define this concept C as meaning X or as meaning Y?

I have come to believe that in such cases, the best approach is almost always to define both a term for X and a term for Y. This makes it possible to have coherent discussions of whether X or Y is needed in a particular case, instead of exchanges in which everyone uses the term C, some of them using it to mean X, some to mean Y, and some without any grasp of the difference.

In the case of modifications of the TEI, and individual documents, I think this means defining a number of terms for varying properties. That could make possible a coherent discussion of what the TEI's definition of conformance should be.

With respect to the topic of this issue, what has become clear is that you, and Lou, are trying to make the term "clean" mean both what it is defined as meaning, and something else which neither of you has yet succeeded in defining clearly.

hcayless commented 7 years ago

I will check the status of any modifications with this, and if needed, Council will try to schedule a followup meeting with @cmsmcq and @lb42 to discuss possible revisions to the Guidelines.

hcayless commented 3 years ago

The language that @cmsmcq was concerned with in this issue has been removed, so I'm going to close this ticket. That said, the discussion was valuable, and I remain worried about the ways we talk about "cleanliness" in chapter 23 and about some of the implications of the discussion there.