Open chad-earthscope opened 3 years ago
This seems like the kind of issue others have encountered, and stationxml should follow any existing solutions. See for example: https://spdx.dev/ids/
Is there a compelling use case for a new element vs embedding the license in an XML comment, like:
<Network code="IU" startDate="1988-01-01T00:00:00" restrictedStatus="open">
<!--
Licensed under Creative Commons - No Rights Reserved , CCO
https://creativecommons.org/share-your-work/public-domain/cc0/
-->
or even just:
<Network code="IU" startDate="1988-01-01T00:00:00" restrictedStatus="open">
<!-- SPDX-License-Identifier: CC0-1.0 -->
In other words, if there is a need for machine parsing of the license, then the syntax should probably be locked down even more than the given example. For example requiring the abbreviation to be from https://spdx.org/licenses/ as opposed to each user making up their own abbreviations. If machine parsing is not needed then the comment idea would work today with no schema changes, which is of course an advantage.
Is there a compelling use case for a channel having a different license from its enclosing station and/or network? I can see a different network perhaps requiring a different license, but we should avoid "xml bloat" where every channel for every station in a network repeats the same license. Even just repeating the abbreviation adds a lot of noise to the xml.
If we incorporating a license is needed, it might be good to consider whether also incorporating a copyright is wise. Copyright and license are orthogonal concepts, but sometimes knowing one without the other is a problem.
Lastly, of course, this sort of thing goes beyond seismologist and software developers and starts to get lawyers involved, so tread carefully.
In other words, if there is a need for machine parsing of the license, then the syntax should probably be locked down even more than the given example. For example requiring the abbreviation to be from https://spdx.org/licenses/ as opposed to each user making up their own abbreviations. If machine parsing is not needed then the comment idea would work today with no schema changes, which is of course an advantage.
I believe machine parsable is the primary goal of this format and design should be targeting that use. Also, details should be describable in the schema, and I don't think that's possible for comments. For these reasons the XML comment option is less desirable in my opinion.
I completely agree that if we can find a list of abbreviations and/or other definitions and examples to draw from we should.
Lastly, of course, this sort of thing goes beyond seismologist and software developers and starts to get lawyers involved, so tread carefully.
I do not believe licensing data is controversial. In my non-legal option, at this point we risk doing more harm than good by not having the ability to declare a license in standardized metadata.
An issue that should also be considered is whether the declared license covers the metadata in addition to the data is describes. Traditionally we have treated metadata as "public domain" in the sense that it can be freely used with no restrictions (or requirements of citation). We should be clear on the scope of any declaration.
An issue that should also be considered is whether the declared license covers the metadata in addition to the data is describes.
I missed this, so worth documenting carefully. I thought you were talking about the stationxml, not the miniseed it relates to. "Data" is a pretty generic term, perhaps there is way to make it clearer. The recommendation makes more sense now, and I agree a comment likely is not the right answer.
Regardless, if we are licensing the waveforms, we might need to license the stationxml metadata too. Just because it is "meta" doesn't mean it isn't someone's property.
I do not believe licensing data is controversial. In my non-legal option, at this point we risk doing more harm than good by not having the ability to declare a license in standardized metadata.
What I mean by this is the details may matter a lot. For example, if there are 2 data license elements, does that mean both have to be satisfied, or the user can pick between the two (and vs or). Maybe better to have only one to avoid this ambiguity. Other seeming small details can have outsized effects.
This may be useful: https://wiki.creativecommons.org/wiki/Marking_Works_Technical
The licence on the stationXML document should be explicit. An attribute in
The licences markup on the waveform data should have starttime/endtime attributes. We should look at DataCite's way of describing the licences to make sure we can describe complex licencing correctly.
There will be duplication between DOI's metadata and stationXML metadata on this matter. So maybe we should write in the documentation who is right (DOI or stationXML ?). From a datacenter point of view, the licence fields in stationXML could be filled from the same sources as the DataCite's fields to ensure consistency.
@jschaeff do you have a link for how DataCite does this that would be helpful?
If there was a starttime/endtime on a license, that could get complicated quickly. For example, you could think of the standard PASSCAL data policy as a license. It is proprietary for 2 years after collection, but CC-0 after 2 years, so the license changes with time. Guess I am wondering if the right answer is to provide a way to link to the actual license policy instead of trying to embed it directly, so keep only the url and not have the abbreviation or any text? Then complex cases can be handled by the license holder instead of by stationxml? That would mean that responsibility for dealing with any conflicts or confusion is totally on the license holder, all we provide is a place to put the URL. Common license types, like CC-0 would all use the creative commons url, so it is easy to tell.
Would two elements like WaveformDataLicense
and StationXMLLicense
help separate the license to use the waveforms, ie miniseed, from a potential license of the stationxml? I am leery of something like MetadataLicense
as one person's metadata is another's data. Although documentation perhaps could help with that.
Although not a stationxml issue, the marking of the license really needs to also be on the actual data itself. Not sure if there is a way to standardize how to do this in miniseed2?
It appears that DataCite is not very complete for complex license management, see pages 27 and 28 of https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf
But, RDA came out with Machine Actionable DMP standard that allows to define fixed embargoes. Bur it does not allow to define rolling embargoes. https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard/blob/master/docs/FAQ.md#how-to-express-embargoes
With a simple URL to the license you would miss the machine readable part. Rolling embargo could be modelized with special tags, although this is a bit cumbersome.
But in the end, I guess, what a machine must know is if the waveform data is open or restricted now. So maybe the current license is enough, as suggested by Chad.
WaveformDataLicense
and StationXMLLicense
are more explicit, I like it.
DataCite information is interesting, propose that we reuse as much of what they have created. They use
<rights>
elements contained in a <rightsList>
instead of license, perhaps making more flexible. So perhaps:
<WaveformRights rightsURI="https://creativecommons.org/share-your-work/public-domain/cc0/"
rightsIdentifier="CC0" >
Creative Commons - No Rights Reserved
</WaveformRights>
Alternative would be to use <Rights>
but then add an attribute or subelement to specify the type data it applies to, like waveform
, stationxml
, etc. This might give more flexibility in complex cases. Perhaps:
<Rights rightsURI="https://creativecommons.org/share-your-work/public-domain/cc0/"
rightsIdentifier="CC0" appliesTo="WAVEFORM">
Creative Commons - No Rights Reserved
</Rights>
Possible to add date ranges, or perhaps things like olderThan="P2Y"
, but I'm not sure how far down this rabbit hole we should go. Note also DataCite uses lower case rights
but stationxml uses capitalized element names.
Currently there is no clear place to include a data license declaration in StationXML and doing so is becoming increasingly important.
One option is to add this by allowing a
DataLicense
element in theBaseNode
definition, which would allow declaration at theNetwork
,Station
, andChannel
levels.The element would be optional and could occur any number of times. An
abbreviation
attribute allows declaration of the common label often used, e.g.CC0
,CC-BY
, etc. AURL
attribute allows identification of license text.This is analogous to the
Identifier
element added in 1.1 revision.For example:
In the schema:
and