Closed fbuettner-hb closed 6 years ago
Hmmm. I am not sure why a conforming RELAX NG engine would report a repeated ID value, when ISO RELAX NG does not support validation of that kind of non-regular constraint, as I understand it. Altering the RELAX NG schema for SVRL to use xsd:Name in those case would indeed be the correct workaround.
ISO SVRL is an informative annex to the ISO Schematron standard, so adopters may certainly correct or tweak the SVRL schema for a practical purpose (without changing the syntax or semantics of the language): doing this violates the standard in no way .
Some technical details on why using xsd:id shouldn't matter, but why it is a mistake anyway (especially if there is RELAX NG software with value-adds to valiate ID uniqueness.)
Yes, you are correct that the RELAX NG schema for ISO SVRL uses datatype xsd:id in several patterns that might repeat, in particular fired-rule, failed-assert and successful-report.
However, the ISO RELAX NG standard does not define datatypes, so the use of XML Schema datatype is governed by two OASIS Documents (which I believe have not been superseded by anything from ISO):
"Guidelines for using XML Datatypes with RELAX NG" at http://relaxng.org/xsd-20010907.html (s4 quoted)
RELAX NG DTD COMPATABILITY at https://www.oasis-open.org/committees/relax-ng/compatibility.html#id which limits when ids can be used, and is not relevant.
The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].
This has the somewhat surprising effect that the xsd:id datatype in RELAX NG is only tested for lexical properties, not for uniqueness. In other words, for RELAX NG, xsd:id is merely an alias for xsd:Name. Uniqueness testing is not a practical capability for a pure regular grammar like RELAX NG. (Perhaps the reason is clearer if I say that of you want to validate that an ID is unique using the ISO DSDL schema languages, you do not use ISO RELAX NG you use ISO Schematron.)
The use of xsd:id was deliberate in the knowledge of this characteristic of RELAX NG's datatypes. So it should not prevent you in any way from using Schematron, and is not a bug.
All that being said, I agree that it is unnecesarily confusing and actively misleading(, and I would support a change in a future revision of ISO Schematron to use xsd:Name instread of xsd:ID for those repeated references. If someone uses xsd:id, the implication that values should be unique (even if not checked by RELAX NG) is reasonable. Perplexingly, you can always translate an DTD ID or XML Schema ID to RELAXNG's xsd:id, but not necessarily the reverse, if the schema was written relying that RELAX NG only does lexical validation.
Furthermore, if your organizations need to change the Schema for SVRL to use xsd:Name instead for any practical reason, in particular for translation to XSD, then please go ahead and document it in the schema you distribute: conversion to xsd:Name would be the appropriate way, I think. Please note that you would not be altering the language SVRL in any way, merely the formality of the declaration: all SVRL documents that are valid using RELAX NG's restricted version of xsd:ID should also be valid using xsd:Name.
Please also note that the grammar for ISO SVRL is an "informative" annex to ISO Schematron, so you are not in any sense violating the standard by making this change, if you need the change.
Thanks a lot for your detailled response. I was not aware that the semantics of W3C XML Schema datatypes is purely lexical in RelaxNG.
Just for the record, we use Jing (bundled with Oxygen 19.1 - build "20140903-saxon") for validation. This engine reports the identity constraints violations:
message "ID "somerule" has already been defined
I have tried the same with MSV (using an XML version of svrl.rnc) - similar result:
Error at line:-1, column:-1 of null
"somerule" is used as an ID value more than once.
And with libxml (xmllint):
Invalid attribute id for element fired-rule
example-svrl.xml fails to validate
For our customers that use JAXB we derive an XML Schema using Trang. That schema yields the same unwanted validation results (it generates xsd:ID, not xsd:NCName).
Given that jing, trang, msv and libxml are defacto standard for validating relaxng, it think it could be helpful for others users, too, to add a modified version of svrl.rnc and some note in the schematron/schema repository (say, svrl-noidtypes.rnc).
Would you support that proposal?
definitely!
On 18 Dec 2017 7:01 PM, "Fabian Büttner" notifications@github.com wrote:
Thanks a lot for your detailled response. I was not aware that the semantics of W3C XML Schema datatypes is purely lexical in RelaxNG.
Just for the record, we use Jing (bundled with Oxygen 19.1 - build "20140903-saxon") for validation. This engine reports the identity constraints violations:
message "ID "somerule" has already been defined
I have tried the same with MSV (using an XML version of svrl.rnc) - similar result:
Error at line:-1, column:-1 of null "somerule" is used as an ID value more than once.
And with libxml (xmllint):
Invalid attribute id for element fired-rule example-svrl.xml fails to validate
For our customers that use JAXB we derive an XML Schema using Trang. That schema yields the same unwanted validation results (it generates xsd:ID, not xsd:NCName).
Given that jing, trang, msv and libxml are defacto standard for validating relaxng, it think it could be helpful for others users, too, to add a modified version of svrl.rnc and some note in the schematron/schema repository (say, svrl-noidtypes.rnc).
Would you support that proposal?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Schematron/schema/issues/2#issuecomment-352352013, or mute the thread https://github.com/notifications/unsubscribe-auth/AX3VKYZdNxrcvRrAtgxmKkBZSCPlX94lks5tBhvlgaJpZM4RBufR .
Thanks! I have added a new pull request #4 that does exactly that.
Updated the schema to change @id to NKTOKEN in every case.
There was a further case of the same issue: the sch:pattern/@id should be an NMTOKEN when transferred to SVRL too, because sch:pattern/@documents may reference multiple documents, and they all may cause separate svrl:active-pattern elements (if sch:pattern/@documents is not used, there will only be one svrl:active-pattern and so the @id will be unique accidentally.)
The id attribute in attlist.assert-and-report and fired-rule has datatype xsd:ID. However, in SVRL, both attributes are many-to-one references to elements in the schematron file (at least this is how the skeleton implementation uses these attribute). Hence, any SVRL instance that has two or more applications of the same rule or two or more violations of the same assertion is invalid.
Fix: Change datatype for both attributes to xsd:NCName.
I am aware that this problem is already in the ISO specification. Nevertheless, I suggest to apply this fix to this repository and highlight the changes in README.md.
Example Schematron file:
Example instance:
SVRL result:
Validation against svrl.rnc yields to errors: