EDIorg / data-package-best-practices

Best Practices for data packages. a gh-pages website, with sections for metadata concepts and aspects of data packaging
https://ediorg.github.io/data-package-best-practices/
14 stars 6 forks source link

Improve Data Package License Machine Readability with SPDX #86

Open clnsmth opened 7 months ago

clnsmth commented 7 months ago

Background

Data package licensing is crucial for clearly communicating data usage terms to consumers. Currently, EDI assists data authors by presenting a couple license options aligned with open data principles, along with a third option for custom content. This information is stored in EML's intellectualRights element, which is required by the EML Congruence Checker.

However, the intellectualRights element accepts loosely formed TextType metadata. While human-readable, this format is largely unintelligible for machines and hinders automated license interpretation.

Proposed Changes

To enhance current practices, and align with Science-On-Schema.org practices for license interoperability, consider encouraging the use of EML's licensed element. This element accepts a URL to a machine-resolvable, linked data compliant license. EDI can offer SPDX license identifiers as linked data URIs alongside the current practices using the intellectualRights element. Eventually, phasing out the free-text intellectualRights element might be considered (see note below).

The current set of EDI-recommended licenses and their corresponding Creative Commons and SPDX identifiers are:

Creative Commons Zero v1.0 Universal

Attribution 4.0 International

Choosing between Creative Commons and SPDX may have future implications for supporting other licenses. SPDX encompasses all Creative Commons licenses and offers greater flexibility for accommodating additional licenses in the future.

Note: Phasing out the free-text intellectualRights element would eliminate potential contradictions with the licensed element and limit support for arbitrary, non-standardized data use terms. However, it may restrict the ability to express nuanced usage rights not yet formalized by the broader community.

Affected Systems

Some affected systems include:

gremau commented 4 days ago

The upcoming release of the EML Best Practices document encourages use of the \<licensed> EML element and will also recommend using the SPDX vocabulary of license URLs in the licensed/url child element. See chapter 6. It does not recommend doing away with \<intellectualRights> since the ECC requires it and it will probably continue to be used by LTER data contributors in the forseeable future.

We can consider resolving this once the new version of the document goes to production

clnsmth commented 4 days ago

Thanks for the heads-up @gremau.

I'll raise this issue at the next developer meeting to let everyone know it is moving forward.

srearl commented 3 days ago

@clnsmth do you have handy any examples of data packages that include this element?

clnsmth commented 1 day ago

You bet @srearl! BLE-LTER makes use of this pattern. For example:

Beaufort Lagoon Ecosystems LTER and V. Lougheed. 2020. Carbon flux from aquatic ecosystems of the Arctic Coastal Plain along the Beaufort Sea, Alaska, 2010-2018 ver 7. Environmental Data Initiative. https://doi.org/10.6073/pasta/e6c261fbd143e720af5a46a9a131a616.

where in the EML you'll find:

<licensed>
  <licenseName>Creative Commons Zero v1.0 Universal</licenseName>
  <url>https://spdx.org/licenses/CC0-1.0.html</url>
  <identifier>CC0-1.0</identifier>
</licensed>

listed alongside the <intellectualRights>, and which displays on the data package summary page as:

Screenshot 2024-11-11 at 6 57 26 AM