Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

ISO Schematron iso-schematron.rng license issue #65

Open hjoukl opened 1 year ago

hjoukl commented 1 year ago

Hi,

pardon me if this isn't the proper place to report such an issue - what would be?

The renowned lxml Python XML library uses the "skeleton" schematron implementation to provide iso schematron support since ~2009.

Lately, Fedora^1 and RHEL^2 and probably soon SUSE strip lxml's iso schematron parts due to the license in iso-schematron.rng (https://github.com/lxml/lxml/blob/4bfab2c821961fb4c5ed8a04e329778c9b09a1df/src/lxml/isoschematron/resources/rng/iso-schematron.rng) "being unclear and potentially non-Free"^3.

Note: This is lxml's vendored copy of the formerly available RelaxNG schema for schematron, added to lxml years ago. Looks like it's still available in its compact form in this repo linked on schematron.com: https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc, with the same license header.

See the lxml mailing list (https://mail.python.org/archives/list/lxml@python.org/message/XZZAANG3Y2EMTVTQ66AH7WKB7N4VILUP/) and issue tracker (https://bugs.launchpad.net/lxml/+bug/2024343) reports on this situation.

lxml tries to mitigate that by optionally running without support for RelaxNG-validation of the schematron schema in use. I.e. you can now remove the iso-schematron.rng file with the "offending" license and still run lxml.isoschematron functionality, albeit without validating the schematron schema itself.

Is there any chance to get this dependency properly re-licensed (or the license text reworded unambiguously), i.e. with a license acceptable for Fedora, RHEL as linux distributors who ship lxml in their distribution?

I wouldn't even know who'd be in the position to to this, if possible. The original author? The ISO org?

Could one reimplement the schematron schema from scratch (if one had access to the standards documents, which aren't publicly available any more, without buying from ISO)? Or maybe there's an alternative open source schema-for-schematron out there somewhere, e.g. an XSD?

Any insights appreciated.

Best regards, Holger

EDIT: Correct formatting to properly show footnotes with links.

tgraham-antenna commented 1 year ago

This probably is the best place for you to report an issue with the license text.

I'm not involved with ISO, but I think there's zero chance that ISO will revise the version before last of the standard to update the license text. You might, however, influence the license wording in the next version, if the WG so decides.

https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc is a copy of the 2020 version of the schema, with a correction added by the ISO editor. The Schematron organisation hosts unofficial copies of the Schematron schemas so that people don't have to copy-and-paste from their PDFs of the Schematron standard (from an idea by @susi-wunsch at https://github.com/Schematron/schematron/pull/15#issuecomment-168993260). After all, permission is granted to "distribute free of charge".


You could generate a schema by using a utility such as trang to generate a schema from a bunch of Schematron documents and then clean that up a little based on your understanding of the standard.


The comment at https://gitlab.com/fedora/legal/fedora-license-data/-/issues/154#note_1273444092 includes "while the license does not require inclusion of the license in copies", but the license includes "The following permission notice and disclaimer shall be included in all copies of this XML schema ("the Schema"), and derivations of the Schema:"

tgraham-antenna commented 1 year ago

You could generate a schema by using a utility such as trang to generate a schema from a bunch of Schematron documents and then clean that up a little based on your understanding of the standard.

There's also utilities -- Oxygen XML Editor has one -- that can generate sample documents from a schema. To make sure that you have all of the elements and attributes, you could generate documents from the schema, then generate a schema from the documents.

Generated schemas tend to have loose content models and have lots of CDATA attributes, so they tend to need fix-up to be more useful.

hjoukl commented 1 year ago

Thanks for sharing your valuable insights! Really appreciated.

I'm not involved with ISO, but I think there's zero chance that ISO will revise the version before last of the standard to update the license text. You might, however, influence the license wording in the next version, if the WG so decides.

I feared so. ;-)

https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc is a copy of the 2020 version of the schema, with a correction added by the ISO editor. [ ...] After all, permission is granted to "distribute free of charge".

That's basically lxml's interpretation too - we can provide iso-schematron.rng since lxml distributes it free of charge. Which seems to be the exact problem for the Linux distros since they regularly (and rightfully) charge for their distribution and support.

Re generating the schema-for-schematron: interesting idea. I do have plenty of Oxygen XML experience from times past, mainly working with XSDs. So that might indeed be a way to kickstart an alternative schema. Still, I suppose you'd need to manually fine-tune it and needed access to the standards PDFs for this.

Thanks again, Holger.

rjelliffe commented 1 year ago

The license is exactly the same as the standard SGML license as used by piblic entity sets WITHOUT PROBLEM for 35 years. Are they going to remove all SGML and XML distros for the same reason?

The problem is not the license, but the ignorance of the original reviewer, AFAICS. It would better to stop the problem at source.

Regards Rick

On Fri, 14 Jul. 2023, 00:42 hjoukl, @.***> wrote:

Hi,

pardon me if this isn't the proper place to report such an issue - what would be?

The renowned lxml Python XML library uses the "skeleton" schematron implementation to provide iso schematron support since ~2009.

Lately, Fedora1 and RHEL2 and probably soon SUSE strip lxml's iso schematron parts due to the license in iso-schematron.rng ( https://github.com/lxml/lxml/blob/4bfab2c821961fb4c5ed8a04e329778c9b09a1df/src/lxml/isoschematron/resources/rng/iso-schematron.rng) "being unclear and potentially non-Free" 3 https://gitlab.com/fedora/legal/fedora-license-data/-/issues/154.

Note: This is lxml's vendored copy of the formerly available RelaxNG schema for schematron, added to lxml years ago. Looks like it's still available in its compact form in this repo linked on schematron.com: https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc, with the same license header.

See the lxml mailing list ( @.***/message/XZZAANG3Y2EMTVTQ66AH7WKB7N4VILUP/) and issue tracker (https://bugs.launchpad.net/lxml/+bug/2024343) reports on this situation.

lxml tries to mitigate that by optionally running without support for RelaxNG-validation of the schematron schema in use. I.e. you can now remove the iso-schematron.rng file with the "offending" license and still run lxml.isoschematron functionality, albeit without validating the schematron schema itself.

Is there any chance to get this dependency properly re-licensed (or the license text reworded unambiguously), i.e. with a license acceptable for Fedora, RHEL as linux distributors who ship lxml in their distribution?

I wouldn't even know who'd be in the position to to this, if possible. The original author? The ISO org?

Could one reimplement the schematron schema from scratch (if one had access to the standards documents, which aren't publicly available any more, without buying from ISO)? Or maybe there's an alternative open source schema-for-schematron out there somewhere, e.g. an XSD?

Any insights appreciated.

Best regards, Holger

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/65, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKJ54I64XQDVCUAAEJTXQACMRANCNFSM6AAAAAA2JBYHXA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rjelliffe commented 1 year ago

The original author of the license was a lawyer called Charles Goldfarb, in around 1985.

Rick

On Fri, 14 Jul. 2023, 00:42 hjoukl, @.***> wrote:

Hi,

pardon me if this isn't the proper place to report such an issue - what would be?

The renowned lxml Python XML library uses the "skeleton" schematron implementation to provide iso schematron support since ~2009.

Lately, Fedora1 and RHEL2 and probably soon SUSE strip lxml's iso schematron parts due to the license in iso-schematron.rng ( https://github.com/lxml/lxml/blob/4bfab2c821961fb4c5ed8a04e329778c9b09a1df/src/lxml/isoschematron/resources/rng/iso-schematron.rng) "being unclear and potentially non-Free" 3 https://gitlab.com/fedora/legal/fedora-license-data/-/issues/154.

Note: This is lxml's vendored copy of the formerly available RelaxNG schema for schematron, added to lxml years ago. Looks like it's still available in its compact form in this repo linked on schematron.com: https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc, with the same license header.

See the lxml mailing list ( @.***/message/XZZAANG3Y2EMTVTQ66AH7WKB7N4VILUP/) and issue tracker (https://bugs.launchpad.net/lxml/+bug/2024343) reports on this situation.

lxml tries to mitigate that by optionally running without support for RelaxNG-validation of the schematron schema in use. I.e. you can now remove the iso-schematron.rng file with the "offending" license and still run lxml.isoschematron functionality, albeit without validating the schematron schema itself.

Is there any chance to get this dependency properly re-licensed (or the license text reworded unambiguously), i.e. with a license acceptable for Fedora, RHEL as linux distributors who ship lxml in their distribution?

I wouldn't even know who'd be in the position to to this, if possible. The original author? The ISO org?

Could one reimplement the schematron schema from scratch (if one had access to the standards documents, which aren't publicly available any more, without buying from ISO)? Or maybe there's an alternative open source schema-for-schematron out there somewhere, e.g. an XSD?

Any insights appreciated.

Best regards, Holger

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/65, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKJ54I64XQDVCUAAEJTXQACMRANCNFSM6AAAAAA2JBYHXA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hjoukl commented 1 year ago

Thanks for chiming in @rjelliffe!

The problem is not the license, but the ignorance of the original reviewer, AFAICS. It would better to stop the problem at source. Regards Rick

Probably. I'm not a lawyer though, and common sense seems not be generally applicable when it comes to legal.

I take it by original reviewer you mean the Fedora reviewer (lawyer?) who qualified the license as being ambiguous ("susceptible of at least four interpretations") and failing Fedora license criteria?

The license is exactly the same as the standard SGML license as used by piblic entity sets WITHOUT PROBLEM for 35 years. Are they going to remove all SGML and XML distros for the same reason?

I just noticed that the iso-schematron.rng originally included in lxml was a different/previous version. The older version carried a different license notice:

"(c) International Organization for Standardization 2005. Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies."

Might that be the standard SGML license you're referring to, probably with updated copyright year? Since this looks identical to the one contained in Berners-Lee's HTML IETF rfc (https://www.ietf.org/rfc/rfc1866.txt, e.g. 9.7.2). And who'd have thought I'd have to dig around there. :-)

I take it at some point in time lxml upgraded iso-schematron.rng to a later version, from the 2016 schematron standard (in commit https://github.com/lxml/lxml/commit/92901bd2b2ff9280df4c9d5ae720e390dfb4da18).

So that might mean that the original license was changed for the 2016 schematron version (by ISO?). Makes me wonder if "copy in any form" allows for modification (copy in modified form?). If that was the case maybe the 2016/2020 schematron changes/upgrades could be sat on top of that.

But I'm very much out of my depth wrt licensing here.

Best regards, Holger

AndrewSales commented 1 year ago

So that might mean that the original license was changed for the 2016 schematron version (by ISO?).

Just to follow up, I believe the only change was to the date.

hjoukl commented 1 year ago

Just to follow up, I believe the only change was to the date.

Quoting the different license texts here:

The originally lxml-included iso-schematron.rng (a) carried this license notice:

<!--
         (c) International Organization for Standardization 2005. 
        Permission to copy in any form is granted for use with conforming 
        SGML systems and applications as defined in ISO 8879, 
        provided this notice is included in all copies.
-->

Whereas the current version from the 2016 schematron standard ((b) introduced in commit https://github.com/lxml/lxml/commit/92901bd2b2ff9280df4c9d5ae720e390dfb4da18) has this:

<!-- Copyright © ISO/IEC 2015 -->
<!--
  The following permission notice and disclaimer shall be included in all
  copies of this XML schema ("the Schema"), and derivations of the Schema:

  Permission is hereby granted, free of charge in perpetuity, to any
  person obtaining a copy of the Schema, to use, copy, modify, merge and
  distribute free of charge, copies of the Schema for the purposes of
  developing, implementing, installing and using software based on the
  Schema, and to permit persons to whom the Schema is furnished to do so,
  subject to the following conditions:

  THE SCHEMA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
  THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
  OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  ARISING FROM, OUT OF OR IN CONNECTION WITH THE SCHEMA OR THE USE OR
  OTHER DEALINGS IN THE SCHEMA.

  In addition, any modified copy of the Schema shall include the following
  notice:

  "THIS SCHEMA HAS BEEN MODIFIED FROM THE SCHEMA DEFINED IN ISO/IEC 19757-3,
  AND SHOULD NOT BE INTERPRETED AS COMPLYING WITH THAT STANDARD".
-->

Which is very much the same as the license header in https://github.com/Schematron/schema/blob/655a641bb8fe21ec4fa7b1a82498c43bc70a3bf0/schematron.rnc (c):

# Copyright © ISO/IEC 2017
# The following permission notice and disclaimer shall be included in all 
# copies of this XML schema ("the Schema"), and derivations of the Schema: 

# Permission is hereby granted, free of charge in perpetuity, to any 
# person obtaining a copy of the Schema, to use, copy, modify, merge and 
# distribute free of charge, copies of the Schema for the purposes of 
# developing, implementing, installing and using software based on the 
# Schema, and to permit persons to whom the Schema is furnished to do so, 
# subject to the following conditions: 

# THE SCHEMA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SCHEMA OR THE USE OR 
# OTHER DEALINGS IN THE SCHEMA. 

# In addition, any modified copy of the Schema shall include the following 
# notice: 

# "THIS SCHEMA HAS BEEN MODIFIED FROM THE SCHEMA DEFINED IN ISO/IEC 19757 3, 
# AND SHOULD NOT BE INTERPRETED AS COMPLYING WITH THAT STANDARD".

So indeed (b) and (c) have the same license text, apart from the copyright year. But the iso-schematron.rng version included in lxml initially (a) has a different license text, which seems to be the "standard SGML license".

AndrewSales commented 1 year ago

I can see that the 2016 text differs from what appeared in the first edition of the ISO standard.

IIRC, I took over as project editor in 2016 when the text was at FDIS (Final Draft International Standard) stage and the earliest draft I worked on already had this text in place.