Schematron / schema-2016

Unofficial copy of schema(s) for Schematron 2016.
Other
4 stars 2 forks source link

SVRL: id attribute attlist.assert-and-report and fired-rule should not be xsd:ID #2

Closed fbuettner-hb closed 6 years ago

fbuettner-hb commented 6 years ago

The id attribute in attlist.assert-and-report and fired-rule has datatype xsd:ID. However, in SVRL, both attributes are many-to-one references to elements in the schematron file (at least this is how the skeleton implementation uses these attribute). Hence, any SVRL instance that has two or more applications of the same rule or two or more violations of the same assertion is invalid.

Fix: Change datatype for both attributes to xsd:NCName.

I am aware that this problem is already in the ISO specification. Nevertheless, I suggest to apply this fix to this repository and highlight the changes in README.md.

Example Schematron file:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
    <sch:pattern>
        <sch:rule context="a" id="somerule">
            <sch:assert id="someassertion" test="count(b) ge 2">Some assertion</sch:assert>
        </sch:rule>
    </sch:pattern>
</sch:schema>

Example instance:

<root>
   <a>
      <b/>
   </a>
   <a>
      <b/>
   </a>
</root>

SVRL result:

<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" ...>
    ...
    <svrl:fired-rule context="a" id="somerule"/>
    <svrl:failed-assert test="count(b) ge 2" id="someassertion" location="/root[1]/a[1]">
        <svrl:text>Some assertion</svrl:text>
    </svrl:failed-assert>
    <svrl:fired-rule context="a" id="somerule"/>
    <svrl:failed-assert test="count(b) ge 2" id="someassertion" location="/root[1]/a[2]">
        <svrl:text>Some assertion</svrl:text>
    </svrl:failed-assert>
</svrl:schematron-output>

Validation against svrl.rnc yields to errors:

  1. Repeated id somerule
  2. Repeated id someassertion.
rjelliffe commented 6 years ago

Hmmm. I am not sure why a conforming RELAX NG engine would report a repeated ID value, when ISO RELAX NG does not support validation of that kind of non-regular constraint, as I understand it. Altering the RELAX NG schema for SVRL to use xsd:Name in those case would indeed be the correct workaround.

ISO SVRL is an informative annex to the ISO Schematron standard, so adopters may certainly correct or tweak the SVRL schema for a practical purpose (without changing the syntax or semantics of the language): doing this violates the standard in no way .

rjelliffe commented 6 years ago

Some technical details on why using xsd:id shouldn't matter, but why it is a mistake anyway (especially if there is RELAX NG software with value-adds to valiate ID uniqueness.)

Yes, you are correct that the RELAX NG schema for ISO SVRL uses datatype xsd:id in several patterns that might repeat, in particular fired-rule, failed-assert and successful-report.

However, the ISO RELAX NG standard does not define datatypes, so the use of XML Schema datatype is governed by two OASIS Documents (which I believe have not been superseded by anything from ISO):

"Guidelines for using XML Datatypes with RELAX NG"  at http://relaxng.org/xsd-20010907.html (s4 quoted)
RELAX NG DTD COMPATABILITY at https://www.oasis-open.org/committees/relax-ng/compatibility.html#id  which limits when ids can be used, and is not relevant.

The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].

This has the somewhat surprising effect that the xsd:id datatype in RELAX NG is only tested for lexical properties, not for uniqueness. In other words, for RELAX NG, xsd:id is merely an alias for xsd:Name. Uniqueness testing is not a practical capability for a pure regular grammar like RELAX NG. (Perhaps the reason is clearer if I say that of you want to validate that an ID is unique using the ISO DSDL schema languages, you do not use ISO RELAX NG you use ISO Schematron.)

The use of xsd:id was deliberate in the knowledge of this characteristic of RELAX NG's datatypes. So it should not prevent you in any way from using Schematron, and is not a bug.

All that being said, I agree that it is unnecesarily confusing and actively misleading(, and I would support a change in a future revision of ISO Schematron to use xsd:Name instread of xsd:ID for those repeated references. If someone uses xsd:id, the implication that values should be unique (even if not checked by RELAX NG) is reasonable. Perplexingly, you can always translate an DTD ID or XML Schema ID to RELAXNG's xsd:id, but not necessarily the reverse, if the schema was written relying that RELAX NG only does lexical validation.

Furthermore, if your organizations need to change the Schema for SVRL to use xsd:Name instead for any practical reason, in particular for translation to XSD, then please go ahead and document it in the schema you distribute: conversion to xsd:Name would be the appropriate way, I think. Please note that you would not be altering the language SVRL in any way, merely the formality of the declaration: all SVRL documents that are valid using RELAX NG's restricted version of xsd:ID should also be valid using xsd:Name.

Please also note that the grammar for ISO SVRL is an "informative" annex to ISO Schematron, so you are not in any sense violating the standard by making this change, if you need the change.

fbuettner-hb commented 6 years ago

Thanks a lot for your detailled response. I was not aware that the semantics of W3C XML Schema datatypes is purely lexical in RelaxNG.

Just for the record, we use Jing (bundled with Oxygen 19.1 - build "20140903-saxon") for validation. This engine reports the identity constraints violations:

message "ID "somerule" has already been defined

I have tried the same with MSV (using an XML version of svrl.rnc) - similar result:

Error at line:-1, column:-1 of null
"somerule" is used as an ID value more than once.

And with libxml (xmllint):

Invalid attribute id for element fired-rule
example-svrl.xml fails to validate

For our customers that use JAXB we derive an XML Schema using Trang. That schema yields the same unwanted validation results (it generates xsd:ID, not xsd:NCName).

Given that jing, trang, msv and libxml are defacto standard for validating relaxng, it think it could be helpful for others users, too, to add a modified version of svrl.rnc and some note in the schematron/schema repository (say, svrl-noidtypes.rnc).

Would you support that proposal?

rjelliffe commented 6 years ago

definitely!

On 18 Dec 2017 7:01 PM, "Fabian Büttner" notifications@github.com wrote:

Thanks a lot for your detailled response. I was not aware that the semantics of W3C XML Schema datatypes is purely lexical in RelaxNG.

Just for the record, we use Jing (bundled with Oxygen 19.1 - build "20140903-saxon") for validation. This engine reports the identity constraints violations:

message "ID "somerule" has already been defined

I have tried the same with MSV (using an XML version of svrl.rnc) - similar result:

Error at line:-1, column:-1 of null "somerule" is used as an ID value more than once.

And with libxml (xmllint):

Invalid attribute id for element fired-rule example-svrl.xml fails to validate

For our customers that use JAXB we derive an XML Schema using Trang. That schema yields the same unwanted validation results (it generates xsd:ID, not xsd:NCName).

Given that jing, trang, msv and libxml are defacto standard for validating relaxng, it think it could be helpful for others users, too, to add a modified version of svrl.rnc and some note in the schematron/schema repository (say, svrl-noidtypes.rnc).

Would you support that proposal?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Schematron/schema/issues/2#issuecomment-352352013, or mute the thread https://github.com/notifications/unsubscribe-auth/AX3VKYZdNxrcvRrAtgxmKkBZSCPlX94lks5tBhvlgaJpZM4RBufR .

fbuettner-hb commented 6 years ago

Thanks! I have added a new pull request #4 that does exactly that.

rjelliffe commented 6 years ago

Updated the schema to change @id to NKTOKEN in every case.

There was a further case of the same issue: the sch:pattern/@id should be an NMTOKEN when transferred to SVRL too, because sch:pattern/@documents may reference multiple documents, and they all may cause separate svrl:active-pattern elements (if sch:pattern/@documents is not used, there will only be one svrl:active-pattern and so the @id will be unique accidentally.)