ADES-attic / xsd

XML Schema (W3C)
0 stars 0 forks source link

The version issue #4

Open skeys opened 9 years ago

schastel commented 9 years ago

I suggest to write down pros and cons in this thread during one week (less? more?), i.e. till 2015-09-09. Then we vote using surveymonkey?

skeys commented 9 years ago

A number of documents on the web present a variety of techniques for schema versioning. One frequently cited document is XML Schema Versioning, which I'll refer to as XSV. Serge's branch at https://github.com/IAU-ADES/xsd/blob/sch/submit.xsd is, I think, Option 2, Usage A from XSV. Quoting,

Advantages:

  • The schemaVersion attribute is an enforceable constraint. Instances would not validate without the same version number.

Disadvantages:

  • The schemaVersion number in the instance must match exactly. This does not allow an instance to indicate that it is valid using multiple versions of a schema.

I'll make a proposal here that is closer to Option 2, Usage B of XSV. Unlike with Usage A, here the version number is not validated, but is more informational. This usage includes Option 1, which is to simply specify the schema version attribute. It means one of our xsd files that currently begins

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

might begin instead

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  version="1.1">

where that 1.1 refers to our version 1.1 of the schema document. The W3C specification says (in the paragraph after the box) "The other attributes (id and version) are for user convenience, and this specification defines no semantics for them." XSV says of this technique:

Advantages:

  • Easy. Part of the schema specification.
  • Instance documents would not have to change if they remain valid with the new version of the schema (case 2 above).
  • The schema contains information that informs applications that it has changed. An application could interrogate the version attribute, recognize that this is a new version of the schema, and take appropriate action.

Disadvantages:

  • The validator ignores the version attribute. Therefore, it is not an enforceable constraint.

I find XSV a little unclear in places, but Usage B goes on to describe embedding version information in the "instance" document -- one of our data files. It uses attributes:

<Example schemaVersion="1.2"
  xmlns="http://www.example"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.example MyLocation\Example.xsd">

and it uses that xsi namespace. The value of following this convention is not clear to me. For consistency with the rest of our schema so far, we might use elements instead of attributes. My proposal is that the instance/data file would contain something like

  <xsd-1.0>
    <schema>ades.xsd</schema>
    <version>1.1</version>
  </xsd-1.0>
  <xsd-1.0>
    <schema>submit.xsd</schema>
    <version>1.7</version>
  </xsd-1.0>

which would render to PSV as

# xsd-1.0
! schema ades.xsd
! version 1.1
# xsd-1.0
! schema submit.xsd
! 1.7

With this proposal, schema version numbers are not validated, but are present as way of understanding how a data file was prepared, what might bit required of certain software (such as the MPC submission process), and how to understand what might be wrong if something is failing to validate. Schema version elements might not even be strictly required. Similar to Steve's suggestion on field order that we always write a consistent field order but still allow other orders, here all of our software would always write schema version information, but not require it to be present. We would strongly encourage other software writers to include it.

schastel commented 9 years ago

Thanks for this thorough analysis. I had something much simpler in mind.