DILCISBoard / E-ARK-CSIP

E-ARK Common Specification for Information Packages
http://earkcsip.dilcis.eu
Creative Commons Attribution 4.0 International
11 stars 5 forks source link

Create named requirements for mets root xmlns and schemaLocation #408

Open koit opened 5 years ago

koit commented 5 years ago

The current version of 5.3.1. Use of the METS root element (element mets) states:

In addition to its attributes the METS root element mets MUST define all relevant namespaces and XML schema locations used in the package employing the @xmlns and @xsi:schemaLocation attributes.

When implementing and using XML schemas the physical location of any schemas needs to be considered accounting for potential unavailability of any resources required for validation that are hosted externally.

In case XML schemas have been included into the package (i.e. placed into the schemas folder) it is recommended to link to the schemas using the relative path of the schema file (i.e. schemas/mets.xsd).

These are concrete MUST rules, so they should have their own named requirements (and the lines above should be removed from the intro of Chapter 5.3.1.):

ID Name & Location Description & usage Cardinality & Level
CSIP7 Namespace declarations
mets/@xmlns
All XML namespaces used in the METS.xml document MUST be declared in mets/@xmlns attributes. A valid CSIP METS.xml document needs at least the declarations for METS, CSIPExtensionMETS and XMLSchema-instance, and in most cases also xlink. 3..n
MUST
CSIP8 Schema locations
mets/@xsi:schemaLocation
The actual locations of XSD files for all used XML namespaces MUST be declared in the mets/@xsi:schemaLocation attribute.
The schema files are needed at the time of validation, so they should be stored at a location accessible throughout the lifetime of the IP, or included in the schemas folder. In the latter case, it is recommended to link to the schemas using the relative path (e.g. schemas/mets.xsd).
Note 1: it is assumed here that xsi has been declared as the prefix for XMLSchema-instance namespace, but this choice is not mandatory.
Note 2: The xsd file location for XMLSchema-instance does not have to be shown because of its special, built-in status.
1..1
MUST

Side effect: the numbers for current CSIP7 to CSIP112 will need to be incemented by 2.

Related to #228.

PhillipTommerholt commented 5 years ago

I really like this solution to have the namespace and schema location requirements stand out along with the other requirements.

I am not sure about the renumbering of the other requirements though. I think the requirement-IDs need to be stable.

karinbredenberg commented 5 years ago

We dont rename ID's for the requriments at this stage.

karinbredenberg commented 5 years ago

I'll look into this. But if we go down this route of adding the namespaces as requirements the whole XML-header also needs to be a requirement.

carlwilson commented 5 years ago

Just to say that I agree with @PhillipTommerholt & @karinbredenberg, we now need to resist the temptation to renumber requirements. Permanent IDs (and by extension URLs) for requirements is more important than sequentially numbered requirements. Once anything new arose we were always going to loose neat numbering.

koit commented 5 years ago

Sorry, I didn't know it was too late. My assumptions were:

There will always be unpredictable additions, so we should come up with a long-term solution. We are wrestling with the classic problem of semantically loaded (or natural) IDs, which can be solved with surrogate (or synthetic) IDs, or with something in between. The semantic load in our case is the logical order of the rules.

I see three solutions:

  1. Natural IDs: Numbers have logical order, new requirements are inserted in the proper place, so the numbering will differ between versions;
  2. Semi-natural IDs: Numbers are grouped to chapters (e.g. CSIP1.1, CSIP1.2, CSIP1.3, CSIP2.1), new requirements are added to the end of the appropriate chapter. An ID once assigned stays the same permanently (unless the chapters are restructured);
  3. Surrogate IDs: IDs are completely meaningless. All IDs are permanent. A possible format is CSIP-xx, where x is [A-Z0-9] (e.g. CSIP-N2, CSIP-05, CSIP-AA, CSIP-Z9). Keeping it case-insensitive would give us a pool of 36^2 = 1296 IDs.
karinbredenberg commented 5 years ago

Its a decision made by the DILCIS Board. We've had decisions and will look up the result and post it here.

(The ID's are xml:ID's which also gives restrictions in the naming of them.)

PhillipTommerholt commented 5 years ago

Nice sum up from the discussion we had in A3 last time, Koit 👍

carlwilson commented 5 years ago

Discussed both points with @karinbredenberg.

Regarding additional requirements. We're in danger of writing our own version of W3C XML requirements here. Would a better solution (or compromise depending on your POV) be to make explicit recommendations with references to authoritative documentation to address this issue?

Regarding requirement numbering: Under consideration, can see the strengths of keeping related requirements together. Time is against us here but that's got to be balanced with the reality that this is our only chance to make such a change. :+1: from me for changing but will depend on pragmatic reality.

carlwilson commented 5 years ago

Do take into account the presence of CSIPExtensionMETS which isn't mandatory but IS required for a valid IP.

carlwilson commented 5 years ago

Am now wondering if been explicit in text regarding XML schema validation been part of any validation process. This would at least imply inclusion of the minimal schema set required for a "schema valid" XML document. This wouldn't require a new requirement necessarily but some better explanatory text that highlights use of the main schema and vocab documents.

carlwilson commented 4 years ago

Section 5.3.1 is pretty explicit about namespacing and schema validation will take care of other parts. I'm not against adding specific requirements but only if they can be tested via schema/schematron. I'd suggest we bump this forward but with a specific aim of developing automated tests. If we can then we add the requirements, which won't break backward compatibility as the requirement currently exists, it's just not explicitly stated/enforced. A test may not be as easy to come up with as it appears: https://stackoverflow.com/questions/35467330/xpath-in-schematron-how-to-determine-if-an-xmlns-attribute-is-present-on-a-node

carlwilson commented 3 years ago

This needs moving to the validator issue list as it's now about testing rather than the specification.