ebu / ebu-tt-m-xsd

1 stars 0 forks source link

Minimal W3C Schema version required for a schema processor to support #30

Open andreastai opened 5 years ago

andreastai commented 5 years ago

Some of the schema documents set xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1" . What is the reasoning to set this to 1.1 and not 1.0? What are the consequences? I quickly checked the standards and it seems that all definitions enclosed by an element with minVersion = 1.1 will be ignored unless the processor supports XML Schema 1.1.

nigelmegitt commented 5 years ago

Hi @tairt I found two places where I had to use v1.1:

Firstly, the definition of headMetadata_type that is used by ebu-tt-xsd enforces the constraint that any other contents that are permitted are in neither the tt nor the ebuttm namespace using the notNamespace attribute, for which XML Schema v1.1 is required: v1.0 does not support it.

Secondly, I had to add v1.1 support in to ebu-tt-datatypes.xsd also because I found that otherwise ebu-tt-xsd could not successfully include and import the datatypes defined in the same namespace both locally and in ebu-tt-m-xsd.

andreastai commented 5 years ago

@nigelmegitt wrote:

Firstly, the definition of headMetadata_type that is used by ebu-tt-xsd enforces the constraint that any other contents that are permitted are in neither the tt nor the ebuttm namespace using the notNamespace attribute, for which XML Schema v1.1 is required: v1.0 does not support it.

@nigelmegitt I think you refer to this schema construct:

<xs:any minOccurs="0" 
       maxOccurs="unbounded"
    processContents="lax"
    notNamespace="http://www.w3.org/ns/ttml urn:ebu:tt:metadata" />

Can you explain which SHALL constraint you want to validate with this construct and can you point me to the part in EBU-TT Part 1 where this is documented?

andreastai commented 5 years ago

@nigelmegitt wrote

Secondly, I had to add v1.1 support in to ebu-tt-datatypes.xsd also because I found that otherwise ebu-tt-xsd could not successfully include and import the datatypes defined in the same namespace both locally and in ebu-tt-m-xsd.

You mean with "locally" the schema that imports the ebu-tt m schema? So for example a data type "foo" that is defined in the schema for Part 1 and Part M? If this is the case: didn't we come to a conclusion that a datatype should only be defined in one of the schema (in this case only in Part 1 or in Part M)?

nigelmegitt commented 5 years ago

@tairt re https://github.com/ebu/ebu-tt-m-xsd/issues/30#issuecomment-511708977 :

Tech3350 v1.2 §2.2 specifies the extensibility rule:

Attributes in a namespace not defined by an EBU-TT specification and not defined by TTML 1 may appear on any element defined by EBU-TT or TTML1.

nigelmegitt commented 5 years ago

@tairt re https://github.com/ebu/ebu-tt-m-xsd/issues/30#issuecomment-511742311 :

You mean with "locally" the schema that imports the ebu-tt m schema?

Right, yes.

So for example a data type "foo" that is defined in the schema for Part 1 and Part M? If this is the case: didn't we come to a conclusion that a datatype should only be defined in one of the schema (in this case only in Part 1 or in Part M)?

Yes, the data types are now uniquely defined either in ebu-tt-m-xsd or the referring/including schema repository, but my recollection is that even so, the inclusion did not work properly unless using schema v1.1. (apologies I can't add further data to this right now because I don't have my usual machine available today to double-check this)

andreastai commented 5 years ago

Reply to https://github.com/ebu/ebu-tt-m-xsd/issues/30#issuecomment-511750212:

The constraints refers to attributes. Your construct applies to elements.

andreastai commented 5 years ago

Reply to https://github.com/ebu/ebu-tt-m-xsd/issues/30#issuecomment-511751440

Thanks @nigelmegitt for the info. I will check it.

nigelmegitt commented 5 years ago

Reply to #30 (comment):

The constraints refers to attributes. Your construct applies to elements.

Sorry, you're right. I should have picked out this wording from the same section:

Arbitrary foreign namespace elements may be added as child elements. A foreign namespace is any XML namespace not defined by an EBU-TT specification and not definedby TTML 1.

andreastai commented 5 years ago

Reply to https://github.com/ebu/ebu-tt-m-xsd/issues/30#issuecomment-511768128 @nigelmegitt

The allowance of foreign namespace elements does not mean the disallowance of "undefined" EBU-TT or TTML Metadata. Further down the spec says that we have the same content model for tt:metadata as defined in TTML (unless not further constraint by this spec).

nigelmegitt commented 5 years ago

@tairt OK, let's discuss - I thought in the past we decided that the document defines the structure and that no other elements than those explicitly listed are permitted. By that logic other elements not defined by EBU-TT or TTML 1 are indeed prohibited.

nigelmegitt commented 5 years ago

We discussed this yesterday but did not get to a clear decision - some further digging is needed to understand a) the original intention and b) if the wording accurately reflects that intention.

andreastai commented 5 years ago

I investigated the issue and tested with XML Schema parsers for version schema 1.0 and 1.1. I checked Part 1 in combination with Part M.

These schemas could not be used by schema parsers that only support XML Schema version 1.0.

There is the possibility to set the vc:minVersion="1.1" attribute just on the schema constructs that require version 1.1. But even then it would only be a partly functional schema with very limited application scope.

It would be good to have schemas that could be used in as many system contexts as possible. From my experience a lot of schema validation contexts still only support schema version 1.0. One reason for this is that widely used libraries (e.g. for C) have not been upgraded to schema version 1.1.

To enable more people to use the schemas I would favor to take out the two places that require schema version 1.1. For the schema import of data types this may result in a duplication for data type definitions (in different namespaces). But from my view the benefit that more people could use the schema would justify it.

andreastai commented 5 years ago

There would also be the option to have two schemas: one that requires Schema 1.1 processing capabilities and another one that is compatible with Schema 1.0 parsers.

nigelmegitt commented 5 years ago

That option to have two schemas would mean that the 1.0 schema would be less functional and match the specification less closely than the 1.1 schema.

andreastai commented 5 years ago

The XML Schema is informative because it can not cover all of the mandatory constraints. In that sense, I would say that the XML Schema 1.1 add one more constraint that could be covered with new validation vocabulary. The XML Schema 1.0 version covers a scope that is also covered by current EBU-TT- and TTML-XSDs.

I think it would also be fine to have just an XML Schema 1.0 XSD. But if we want to use XSD 1.1 features it should come in a complementary schema.

nigelmegitt commented 4 years ago

Due to the discussion about #31 I did some investigation about what is possible in XML Schema 1.0:

  1. It is not possible just to remove the namespace="##other" attribute from the metadata element type definitions' <xs:any> elements. Doing so means that the default value of "##any" applies, and there is then a Unique Particle Attribution error because in XML Schema 1.0 the validator cannot make a decision about which group an element belongs in, either the specified optional ones or the "any" ones.
  2. The namespace="##other" as defined in XML Schema 1.0 means:

a pair of not and the ·actual value· of the targetNamespace [attribute] of the <schema> ancestor element information item if present, otherwise ·absent·.

In this case we are applying it to the metadata element in the http://www.w3.org/ns/ttml namespace, so ##other here effectively means "anything else not in the TT namespace". It does not matter what namespace the type is being defined in. For example, defining ebuttm:metadata_type and creating an element in the TT namespace with that type does not mean that ##other is relative to ebuttm: namespace, it means it is relative to the TT namespace.

  1. If we change the metadata element definition to allow any number of <xs:any namespace="##other" processing="lax" ... /> children then the validation checks that occur for each child element C are: a. C's namespace must be declared in the document, unless it has no namespace. b. if C is defined in the schema, then C must conform to its schema type requirements. c. C must not be in the TT namespace. d. if C is not defined in the schema, then it is not checked further except for generic parsing validity. This is true even if some elements in C's namespace are defined in the schema. For example, if C is an ebuttm:notDefinedElement element, which is not defined in the schema, it still passes validation.

This means that the compromise we have to accept if we make this particular change to revert to XML Schema 1.0 then the schema will not validate any undefined elements in the EBU-TT namespace within a metadata element. This means it would not catch, for example, misspelled element names.

An alternative compromise is for the schema not to permit any extension content at all, which is also bad.

@tairt I think your position is that we should make the change proposed in 3. above and trade off the precision of XML Schema 1.1 for the availability of XML Schema 1.0, at least as the main or primary schema in the repository. Is that correct?

andreastai commented 4 years ago

Thanks for the investigation @nigelmegitt. Indeed I prefer to apply the XML Schema 1.0 compatible version. We have chosen this solution you sketch for EBU-TT-D (and I think also in the first version of the EBU-TT Part 1 schema):

<xs:element name="metadata">
        <xs:complexType>
        <xs:sequence>
            <xs:any namespace="##other" processContents="lax" minOccurs="0"
                maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
 </xs:element>
nigelmegitt commented 4 years ago

OK I will fix #31 to do that then. We are missing out on so much potential validation that I think we could consider changing approach - I've raised #32 to explain another route we could take in this repo.