chianti-atomic / ChiantiPy

ChiantiPy is a python package to calculate the radiative properties of astrophysical plasmas based on the CHIANTI atomic database
63 stars 32 forks source link

The Chianti database is not licensed #76

Closed Cadair closed 5 years ago

Cadair commented 7 years ago

I was thinking about making a conda package containing the actual database (as well as one for ChiantiPy) to make installation and distribution easier (also hopefully to fix #67).

However, the http://www.chiantidatabase.org/chianti.html website does not state a license just that

The CHIANTI package is freely available.
If you use the package, we only ask you to appropriately acknowledge CHIANTI .

which is not a licence, and would be considered by many to be a non-free license because of the apparent requirement to cite CHIANTI. The database should be under some more formal license so that we know our position with respect to re-distributing the data.

ping @dpshelio who might have a better idea about what licenses would be appropriate for data?

kdere commented 7 years ago

well, ChiantiPy is under a plain ISC license.

the CHIANTI group have never considered a license.

there is not a requirement to cite CHIANTI, merely a request

Ken

On 12/05/2016 12:48 PM, Stuart Mumford wrote:

I was thinking about making a conda package containing the actual database (as well as one for ChiantiPy) to make installation and distribution easier (also hopefully to fix #67 https://github.com/chianti-atomic/ChiantiPy/issues/67).

However, the http://www.chiantidatabase.org/chianti.html website does not state a license just that

|The CHIANTI package is freely available. If you use the package, we only ask you to appropriately acknowledge CHIANTI . |

which is not a licence, and would be considered by many to be a non-free license because of the apparent /requirement/ to cite CHIANTI. The database should be under some more formal license so that we know our position with respect to re-distributing the data.

ping @dpshelio https://github.com/dpshelio who might have a better idea about what licenses would be appropriate for data?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/chianti-atomic/ChiantiPy/issues/76, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtIVky9i_evnwE3XDRoTlkU-4Hed6qpks5rFE55gaJpZM4LEhGo.

-- Kenneth P. Dere Research Professor of Solar Physics Department of Physics and Astronomy George Mason University kdere@gmu.edu

Cadair commented 7 years ago

I agree on the citation requirement thing, although other open source packages have had issues with the specific wording of that "request" being interpreted as a "requirement" so one should tread carefully around such things.

What would a good way be to make the CHIANTI group consider a license? To my mind (and probably to that of any lawyer) "freely available" does not grant any rights for re-distribution or modification at all, and probably nothing else beyond download and read this (with your eyes).

dpshelio commented 7 years ago

I'll have to look more in detail to give a better suggestion from my side, check also a couple of sources because databases are in a special situation, don't know whether that was in US or in EU. Also, it probably matter where was done. The more permissive would be Public Domain, anyone could do whatever it pleased with it. However, I'm a bit confused between what "Package" means, is it the code? is it the database? is it both together? I imagine there is people who uses one or the other or both. For simplicity I would separate the two parts of CHIANTI.

Cadair commented 7 years ago

The CHIANTI tarball includes both code and data I think, but they don't have to be licensed the same even if they are distributed together.

kdere commented 7 years ago

there are two separate tarballs, one for the database and one for the IDL code

wtbarnes commented 7 years ago

But in the excerpt from the CHIANTI webpage that @Cadair cited above, does "package" mean the database, the code, or both? I think this is what @dpshelio is getting at as well.

To me it seems like the two should be licensed separately as they'll be used in very different ways.

wtbarnes commented 7 years ago

This licensing issue seems intertwined with the issue of distributing the data. This is partly what is holding up sunpy/sunpy#1897 as ChiantiPy (and thus the CHIANTI database) cannot become a dependency of SunPy unless it can be installed with conda (e.g. from conda-forge).

@Cadair do you know of any way of distributing a database/dataset as an installable (pip or conda) package? Is there any precedent for this? I've done a fair bit of googling and have not found anything.

One possible alternative to distributing a tarball/(a bunch of files and directories) would be to repackage the CHIANTI database as a single HDF5 file. However, such a file still comes out to ~600 Mb. Though this could be compressed a bit, this still seems too large to just throw in conda-forge. Maybe it could be hosted somewhere else (or just grabbed from its current location) and then downloaded automatically?

Either way, the database must be licensed before it can be placed on conda-forge. Re: licensing, Creative Commons may be a good option though there is still the issue of getting the CHIANTI team to include a license when distributing the data.

Another possible issue: the database contains data from other sources, e.g. NIST, National Bureau of Standards, etc. (see footer in h_1.wgfa file), so could the CHIANTI team license this data even if they wanted? Would it need to be distributed under multiple licenses? Unsurprisingly, the NIST atomic database also appears to be unlicensed.

dpshelio commented 7 years ago

I think NIST ADS has a copyright and the software as Public Domain. I've looked around and I've found @HEPData from CERN that uses CC0 for their database. I would suggest to use the same for CHIANTI, but that has to be decided by them. I've also written to CDS asking what they do in their cases (I didn't find any mention on their website, only to "how to acknowledge it").

Regarding format, I would suggest something like sharing the text tables via zenodo (now it includes versioning!) so the data could be mined easily (probably you could even search it even from google). Software to convert that into something more usable (sqlite, hdf5, idl sav files,...) could be done as a external thing - via a package under @chianti-atomic.

kdere commented 7 years ago

I doubt that the CHIANTI team wants to put its resources into this. We don't really have that much funded time to work on CHIANTI and the database itself is in need up data upgrades. When we started, there was not such thing as zenodo and are formats are pretty well fixed aside from changing one or other as is needed.

dpshelio commented 7 years ago

At the moment, all it's needed from CHIANTI team is to choose a license (Public Domain, CC0 are the most permissive; CC-By would require people t acknowledge CHIANTI). Putting it in something like zenodo is secondary, but it would be pretty useful as you would get a DOI for easy reference to a particular version. I don't think there's need to change the format of the data files - they are plain text, so that's good.

kdere commented 7 years ago

Public domain or CC0 sound like something that would be appropriate and I will see if I can convince the CHIANTI crowd. We are have a Skype in the middle of June.

wtbarnes commented 7 years ago

@dpshelio Thanks for looking into this and for contacting CDS. I agree that CC0 and CC-By seem to be the most sensible choices. The statement on the CHIANTI webpage seems most consistent with CC-By.

@kdere I didn't mean to imply the CHIANTI team should change the format of the database. This is certainly nontrivial and I recognize time and funding are limited. Rather, once it is licensed appropriately, any one of us could upload it to Zenodo so that it could be assigned a DOI and be more easily accessed (though I'm still not sure if this really addresses the "database-as-a-dependency" problem). This data could be reshaped into any format we like (e.g. HDF5 as opposed to a tarball).

namurphy commented 6 years ago

@kdere - Is there any news about adding a license to the Chianti database? I recently ran across a website that describes the potentially serious disadvantages of having no license, and it would be really helpful for Chianti to have one!

I am wondering if the Creative Commons Attribution 4.0 International (CC BY 4.0) license would be the best choice for Chianti for the same reason that the database itself contains references to the source of the data. The attribution requirement would make sure that researchers in the future will be able to figure out the provenance of the data. This in turn will help reproducibility.

Thanks! -Nick

kdere commented 6 years ago

It has been difficult getting agreement on this within the CHIANTI team. I seem to remember a comment that one of the licenses would have us announce that we were giving up all claims(?) to the data forever and this did not sound righe

namurphy commented 6 years ago

They might be thinking of the CC0 license, which approximates putting the database into the public domain and gives anyone broad freedom to use, redistribute, and modify the work. The developers would effectively give up the copyright under this license, but would still be able to update and maintain the database. It sounds like the objection was related to this license.

The CC BY 4.0 license (or other Creative Commons licenses) would allow the Chianti developers to maintain the copyright but would allow others to use, redistribute, and modify the work as long as attribution is given. A license that allows derivative works would be necessary to allow users to put the database in HDF5 form or to put ionization and recombination rates into an eigenvalue matrix, for example. The attribution is important, in my opinion, because it requires users to provide metadata on where the data they use or adapted came from, which in turn leads to better science.

The Creative Commons site for choosing a license is a straightforward resource to navigate the different options.

As discussed earlier in this issue, the source code and database can and should be licensed separately, which would involve specifying what each license would cover. I've been thinking about this too much over the last two days for some PlasmaPy repositories that will contain both source code and written/graphical content. It would be possible to have language that says something like:

Source code files and code snippets provided in this distribution are licensed under the MIT license. All other content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Copyright 199X-2017, CHIANTI developers. All rights reserved.

Thank you for discussing this with the CHIANTI team!

Cadair commented 6 years ago

For the avoidance of all doubt, the relevant part of the licence is:


Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. Attribution.

  1. If You Share the Licensed Material (including in modified form), You must:

    A. retain the following if it is supplied by the Licensor with the Licensed Material:

    i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);

    ii. a copyright notice;

    iii. a notice that refers to this Public License;

    iv. a notice that refers to the disclaimer of warranties;

    v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

...

(emphasis mine)


The core part of CC-BY here is that if someone uses the database they must acknowledge that use of the database in a form that is specified by the CHIANTI team. (i.e. a list of all authors or just "The CHIANTI team").

In return for obeying these terms of the licence the CHIANTI team grants the user the right to modify, and redistribute the database. (Unlike at the moment where no rights are granted to the user, so the user can not actually use the database).

kdere commented 6 years ago

when posting on github-ChiantiPy, you are really only talking to me. It would be better to start a discussion on the chianti google groups which will be seen by the whole CHIANTI team. I would start off with why licensing the CHIANTI database is important, what you see has the best way to do it and how have other databases been licenses. One of our problems is that we do some calculations ourselves but the main source of data is from the journals.