ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
113 stars 33 forks source link

add guidance for license to use SPDX URLs #47

Closed mbjones closed 4 years ago

mbjones commented 4 years ago

At the Polar Data Forum meeting, we discussed recommending a standard license URL for the license field. While Creative Commons licenses have canonical URLs, many or most others are ambiguous about what their license urls would be. The SPDX Licenses list provides a comprehensive list of licenses and a standard set of URLs for referring to them. This is the primary source of license information for standard operating systems like Debian and languages like Python, and we recommend that guidance should be added that these URLs should be used over making up your own for a given license.

mbjones commented 4 years ago

Related to issue #60 for SPDX license lists.

mbjones commented 4 years ago

Discussion at ESIP Winter proposed that we evaluate alternatives, including:

  1. Using the SPDX URL directly
  2. Use the SPDX License shortname directly
  3. Ingest the SPDX structure into an ontology in COR and use the COR URI
  4. For licenses with well-known URIs (like Creative Commons), use the original license URI

Need to discuss and propose an approach.

mbjones commented 4 years ago

Note that there are machine-readable versions of the SPDX in RDF, JSON, and other formats here:

https://github.com/spdx/license-list-data

For example, here’s the Apache license details in RDF Turtle:

https://github.com/spdx/license-list-data/blob/master/rdfturtle/Apache-2.0.turtle

mbjones commented 4 years ago

Started a new branch for this work, in feature_47_use_spdx_licenses. See https://github.com/ESIPFed/science-on-schema.org/blob/feature_47_use_spdx_licenses/decisions/47-use-spdx-licenses.md

smrgeoinfo commented 4 years ago

The draft suggests that the cc URIs can be used as an alternative to the spdx URIs for the cc licenses, but providing two options is not good for interoperability. I suggest that the recommendation is to always provide an spdx URI, and include any others that might be in common use; the sdo:license element value would need to be an array: "license": [ "http://spdx.org/licenses/CC0-1.0", "https://creativecommons.org/publicdomain/zero/1.0", "CC0-1.0"]

smrgeoinfo commented 4 years ago

The best way to access spdx is the web page at https://spdx.org/licenses/, but the problem is that the HTTP URIs are not obvious-- you have to pull them out of the links in column 1; the fact that column 2 is labeled 'Identifier' is likely to confuse people thinking that's what they should use (it certainly looks like that is what SPDX is promoting). It would be really useful if we could provide a tabular view that shows the URIs that should appear in the schema.org instances.

mbjones commented 4 years ago

@smrgeoinfo I also was thinking that providing multiple URIs for the CC licenses makes sense, although it could introduce inconsistencies. The SPDX records include rdfs:seeAlso links back to the original CC license URI. But its better to list them both I think. Are there any objections to the suggestion to recommend always providing an SPDX URI?

mbjones commented 4 years ago

@smrgeoinfo Regarding the accessibility of the URIs, I agree it's unfortunate that the canonical URI for the SPDX license isn't easily accessible (and differs from the URI of the web page from the table (e.g., https://spdx.org/licenses/Apache-2.0 is the linked data URI that should be used, but the web page is at https://spdx.org/licenses/Apache-2.0.html). In the decision record I suggested that someone could put up a searchable database from the structured data, such as the COR ontology repo, but I think that should be independent of whether we recommend using SPX in the guidance docs. But I will try to clarify how to find the URI in the docs.

mbjones commented 4 years ago

Oh, and I'll note that SPDX is not pushing for the URIs at all, but rather they recommend using the SPDX LicenseId like this:

// SPDX-License-Identifier: GPL-2.0-or-later

I think the URI approach is probably better in the linked data context of schema.org, but maybe we should also indicate whether people could or shoud use SPDX LicenseId as well, as @smrgeoinfo did in his example.

steingod commented 4 years ago

I would suggest to always use URIs (avoiding the SPDX example above) and though I would prefer avoiding using multiple URIs, I do see the points raised above. In this context I would suggest the SPDX is always provided and that URIs are used.

rduerr commented 4 years ago

Given that the Creative Commons folks will likely not be able to use mentions of the SPDX URL's or terms in their usage gathering (important for maintaining funding); I strongly support requiring use of canonical URL's for licenses that have them (e.g., Creative Commons) and SPDX if not.

I note that GNU has canonical URL's for their licenses, complete with versions even for outdated licenses. The Apache licenses also have canonical URL's.

You might want to look at http://cor.esipfed.org/ont/earthcube/swl for a thought at how to use an ontology that uses those canonical URL's. Perhaps this could be updated?

mbjones commented 4 years ago

The Apache-2.0 license contains the following URL (which doesn't point specifically at the 2.0 version):

That page provides a link to the Apache-2.0 license page at Apache.org (http://www.apache.org/licenses/LICENSE-2.0), which in turn lists the following two URLs for the license at the very top of the page, along with the SPDX licenseId:

I can't find any location that indicates which of these 4 URLs is canonical and should be used for referencing. None of those pages contain any mahcine-readable metadata like the SPDX pages do, as far as I can tell. Thoughts and pointers appreciated on how to find the canonical URL.

rduerr commented 4 years ago

http://www.apache.org/licenses/ also points to the Apache 1.1 license (http://www.apache.org/licenses/LICENSE-1.1) and the Apache 1.0 license (http://www.apache.org/licenses/LICENSE-1.0). The naming convention and location for the license is very clear. Apparently my use of the word canonical is throwing you off. I apologize for that.

mbjones commented 4 years ago

Decision marked as accepted in associated ADR. Changes merged into develop and ready for release.