Closed mbjones closed 4 years ago
Related to issue #60 for SPDX license lists.
Discussion at ESIP Winter proposed that we evaluate alternatives, including:
Need to discuss and propose an approach.
Note that there are machine-readable versions of the SPDX in RDF, JSON, and other formats here:
https://github.com/spdx/license-list-data
For example, here’s the Apache license details in RDF Turtle:
https://github.com/spdx/license-list-data/blob/master/rdfturtle/Apache-2.0.turtle
Started a new branch for this work, in feature_47_use_spdx_licenses
. See https://github.com/ESIPFed/science-on-schema.org/blob/feature_47_use_spdx_licenses/decisions/47-use-spdx-licenses.md
The draft suggests that the cc URIs can be used as an alternative to the spdx URIs for the cc licenses, but providing two options is not good for interoperability. I suggest that the recommendation is to always provide an spdx URI, and include any others that might be in common use; the sdo:license element value would need to be an array:
"license": [ "http://spdx.org/licenses/CC0-1.0", "https://creativecommons.org/publicdomain/zero/1.0", "CC0-1.0"]
The best way to access spdx is the web page at https://spdx.org/licenses/, but the problem is that the HTTP URIs are not obvious-- you have to pull them out of the links in column 1; the fact that column 2 is labeled 'Identifier' is likely to confuse people thinking that's what they should use (it certainly looks like that is what SPDX is promoting). It would be really useful if we could provide a tabular view that shows the URIs that should appear in the schema.org instances.
@smrgeoinfo I also was thinking that providing multiple URIs for the CC licenses makes sense, although it could introduce inconsistencies. The SPDX records include rdfs:seeAlso
links back to the original CC license URI. But its better to list them both I think. Are there any objections to the suggestion to recommend always providing an SPDX URI?
@smrgeoinfo Regarding the accessibility of the URIs, I agree it's unfortunate that the canonical URI for the SPDX license isn't easily accessible (and differs from the URI of the web page from the table (e.g., https://spdx.org/licenses/Apache-2.0 is the linked data URI that should be used, but the web page is at https://spdx.org/licenses/Apache-2.0.html). In the decision record I suggested that someone could put up a searchable database from the structured data, such as the COR ontology repo, but I think that should be independent of whether we recommend using SPX in the guidance docs. But I will try to clarify how to find the URI in the docs.
Oh, and I'll note that SPDX is not pushing for the URIs at all, but rather they recommend using the SPDX LicenseId like this:
// SPDX-License-Identifier: GPL-2.0-or-later
I think the URI approach is probably better in the linked data context of schema.org, but maybe we should also indicate whether people could or shoud use SPDX LicenseId as well, as @smrgeoinfo did in his example.
I would suggest to always use URIs (avoiding the SPDX example above) and though I would prefer avoiding using multiple URIs, I do see the points raised above. In this context I would suggest the SPDX is always provided and that URIs are used.
Given that the Creative Commons folks will likely not be able to use mentions of the SPDX URL's or terms in their usage gathering (important for maintaining funding); I strongly support requiring use of canonical URL's for licenses that have them (e.g., Creative Commons) and SPDX if not.
I note that GNU has canonical URL's for their licenses, complete with versions even for outdated licenses. The Apache licenses also have canonical URL's.
You might want to look at http://cor.esipfed.org/ont/earthcube/swl for a thought at how to use an ontology that uses those canonical URL's. Perhaps this could be updated?
The Apache-2.0 license contains the following URL (which doesn't point specifically at the 2.0 version):
That page provides a link to the Apache-2.0 license page at Apache.org (http://www.apache.org/licenses/LICENSE-2.0), which in turn lists the following two URLs for the license at the very top of the page, along with the SPDX licenseId:
I can't find any location that indicates which of these 4 URLs is canonical and should be used for referencing. None of those pages contain any mahcine-readable metadata like the SPDX pages do, as far as I can tell. Thoughts and pointers appreciated on how to find the canonical URL.
http://www.apache.org/licenses/ also points to the Apache 1.1 license (http://www.apache.org/licenses/LICENSE-1.1) and the Apache 1.0 license (http://www.apache.org/licenses/LICENSE-1.0). The naming convention and location for the license is very clear. Apparently my use of the word canonical is throwing you off. I apologize for that.
Decision marked as accepted in associated ADR. Changes merged into develop
and ready for release.
At the Polar Data Forum meeting, we discussed recommending a standard license URL for the
license
field. While Creative Commons licenses have canonical URLs, many or most others are ambiguous about what their license urls would be. The SPDX Licenses list provides a comprehensive list of licenses and a standard set of URLs for referring to them. This is the primary source of license information for standard operating systems like Debian and languages like Python, and we recommend that guidance should be added that these URLs should be used over making up your own for a given license.