Open hcayless opened 4 years ago
In addition to object isolation as @hcayless mentions above, the new system might:
That's what occurs to me off to of my head.
Noting from our F2F discussion that this will likely involve some Schematron and XSLT work, and we might want to organize a group of us working on this optimally before the next Roma release, though this work is separate from Roma itself.
I overall like James’ list, with a few concerns.
- warn if not well formed and valid ODD
There is no point in checking well-formedness, Roma, the tool we are discussing, OxGarage, the Stylesheets … nothing will work if it is not well-formed, and anything you do will tell you if it is not. Checking validity … sure, against what?
- warn where Schematron rules are no longer applicable but still included
How in BLEEP are we supposed to check whether a rule is applicable or not? (We can test if it is outdated or not by looking at @validUntil
, if it has one. But applicable? What does that even mean?)
And add:
Note that quite a few of the items listed are checked by the tei_customization schemas.
A further note on James's list:
warn if required TEI elements aren't available like titleStmt
Will be difficult absent some intrinsic attribute of "requiredness" on TEI elements. That is, it would require that we add some property to the formal definition of the TEI or that the sanity checker know things about TEI that TEI does not know about itself.
This starts to get into notions of "clean" (or not) customizations which are problematic.
Well, yes in theory, but in practice you can just list the absolute requirements. (That is what tei_customization does.) They are <TEI>
, <teiHeader>
, <fileDesc>
, <titleStmt>
, <title>
, <publicationStmt>
, and <sourceDesc>
. That list is not likely to change much in any rapid way at all.
That leaves out conditional requirements, of course. E.g. that you have something that can go inside <sourceDesc>
. Or e.g. that if you have a <text>
in your schema (which you might not — you might only be interested in <sourceDoc>
or <standOff>
), then you must have a <body>
. (The opposite, BTW, is not true: you can have a <body>
without a <text>
, as <body>
can go inside <floatingText>
which might occur in a <note>
somewhere inside <sourceDoc>
or <standOff>
.)
This is reminding me of a case-in-point: One of my former students was writing an ODD customization and inadvertently left out the textstructure module so had no TEI root element. It took us a while to realize how she'd done that. I think this sanity checker should be useful to anyone writing ODDs (whether using the Roma or not).
Agreed, @ebeshero , but to be fair, tei_customization would have caught that.
@sydb This was years ago...not sure if tei_customization was around at that point, but anyway she wasn't using it.
More to the point, do we expect people generating an ODD with Roma not to have access to tei_customization?
Sorry, @ebeshero , my head got spun by the double negated expectation. But we expect everyone in the entire world to have access to tei_customization, at least in RelaxNG.
@sydb Okay, let's turn that into a positive question: Should the first stage of sanity checking simply be to associate the tei_customization RNG, so then any further sanity checking we design should be complementary to this? I see that there's a handy WWP blog post about how TEI members can access this, as well as your article from 2019 (and I think I remember your presentation), but I'm not at all sure it's widely known as yet. Can we make it more prominent as part of this effort?
@ebeshero you be taking words out of my mouth! Seriously, I would not be surprised if there were a few constraints in tei_customization that we did not want, but something similar to it is probably right on target.
I guess you keep in mind that the sanity checker should not be too aggressive with specifications aiming at defining non TEI vocabularies, or even worse those reusing only some specific TEI crystals in other vocabularies. Thus, checking the presence of specific TEI header element definitions should only occur when these elements are actually used.
I agree with @laurentromary where above I've said check this or that TEI-specific thing, that should only happen in files in the TEI namespace.
Actually, no, @laurentromary. I did not imagine performing a sanity check on anything other than a TEI customization file. Thus I think it should be quite aggressive. If you are writing your own language in ODD, you would need your own sanity checking, no?
If you guaranty that the sanity checker would not fire in a place where I would use an non-TEI-intended-ODD then I don't care. Probably a roadmap with a test implementation would allow us to see if we open cans of worms anywhere.
I think there are two levels of checking here:
Is the internal organization of an ODD consistent? E.g. plucking a few from @jamescummings 's list above:
ONLY for TEI customizations:
For the latter case, we could even just rely on using Oxgarage to validate via tei_customizations and show a report.
Roma used to have a feature that would allow you to check the "sanity" of a constructed schema. This is a desirable thing and we want to figure out how to re-implement it rather than attempt to resurrect the ancient and long-broken PHP implementation that used to do this. This ticket is intended to capture requirements for a new sanity checker.
The tool should at a minimum check that included features in a schema are reachable (e.g. no elements that are not referenced in any content models).