cognoma / cancer-data

TCGA data acquisition and processing for Project Cognoma
Other
20 stars 28 forks source link

Repository licensing options that are appropriate for non-code content #5

Closed dhimmel closed 7 years ago

dhimmel commented 8 years ago

Currently, this repository is licensed under a BSD 3-Clause License, which is the default license for Project Cognoma repositories. This license was chosen for its permissive and open nature as well as it's compatibility with other Greene Lab software products. However, the BSD 3-Clause License is intended for code. Rather than referencing all content, the license specifically mentions "source code".

However, as a data repository, cognoma/cancer-data will hold much more than source code including data, visualizations, writing, documentation, and notebooks which combine all of the aforementioned content types into a single file.

As far as openly releasing content that is not only code, there's currently a short list of recommended licenses. My favorite is CC0 because it waives all copyrights and effectively places work in the public domain. This means anyone can use CC0 content without agonizing over license compatibility.

Despite being one of the most liberating licensing options available, CC0 is not an OSI-approved open source license for complicated reasons. We're still investigating the license of TCGA and Xena Browser data (see #3), but it's possible our upstream data providers may impose restrictions on our use that may require attribution, in which case CC BY is an option. However, Creative Commons recommends against using its licenses (besides CC0) for software.

Hence, switching the entire repository to a Creative Commons license doesn't make sense, since we want to remain OSI conformant. However, we also need a licensing arrangement that is appropriate for content that is not "source code". One option I can think of is multi-licensing where we apply both a CC0/CC BY License and a BSD 3-Clause License to the repository. Users would then be able to choose either license based on their use case. I want to make sure that this is a legally valid approach. What do people think (also asked on Twitter)?

dhimmel commented 8 years ago

It appears that at least some Wikipedia articles are dual-licensed under CC BY-SA and GFDL. See the following prompt to save your changes:

wiki-dual-license

So the dual-licensing of content under multiple open licenses does have some precedent.

dhimmel commented 7 years ago

I was talking with @jakevdp at a Data Science Summit, and he mentioned he was also dealing with how to license repositories with notebooks, which contain code, prose, figures, and data.

@jakevdp — any advice you have here (now or later) would be appreciated. Also let us know how you end up licensing your notebooks with diverse content types.

jakevdp commented 7 years ago

I'm not sure what is best in this case. I asked about this on twitter a while ago, and the responses were pretty informative: thread.

dhimmel commented 7 years ago

Thanks @jakevdp, here are some of the tweets grouped by category:

Dual license the entire repo

Tweet by @minrk:

@jakevdp perhaps simpler is to dual-license the whole thing, so people can use appropriate license for the downstream work.

Tweet by @minrk:

@jakevdp the reason I like dual license for everything (if available) is that it's the downstream work that this matters to, not the source.

Tweet by @SethMichaelLarson:

@jakevdp I've seen LICENSE-MIT and LICENSE-APACHE2 in the same repo with an explanation in the README. That's probably how I would do it.

License by type

Tweet by @minrk:

@jakevdp in answer to the actual question, I might use a single LICENSE (or LICENSES) file with both and the explanation of when to use each

Tweet by @earthorguk:

I am told that outputs are special @minrk @jakevdp https://git.io/vXUFI

Tweet by @benjaminrose:

@jakevdp @arfon @github notebooks have different cell types. Can you use that to indicate what is text and what is code?

Tweet by @labarba:

@jakevdp @rasbt In #numericalmooc, we unite CC-BY and MIT in the license file, and add a message in all notebooks https://github.com/numerical-mooc/numerical-mooc

Tweet by @jakevdp:

@_benjaminrose @arfon @github My best idea now is to include LICENCSE_CODE and LICENSE_TEXT and describe the intent in the README.

In between

Tweet by @rasbt:

@jakevdp @github sth. like -> & having 2 seperate license files + texts. Not sure if it's ideal though

cqq-gj_umaal57h

minrk commented 7 years ago

The reason I like to do dual licenses for the whole repo is letting the inheritor decide what's appropriate. Cases where BSD for code, CC-BY for non-code cause unnecessary headaches:

I'm not sure any benefit is achieved by separating which parts are under which license, but I could be wrong about that.

dhimmel commented 7 years ago

@minrk your assessment makes a lot of sense to me. I think dual licensing is the way to go.

I'm not sure any benefit is achieved by separating which parts are under which license, but I could be wrong about that.

It seems like you could always derive the separate licensing scheme from the dual licensing scheme. Therefore, I don't think there should be any major disadvantages to dual rather than separately licensing the content.

dhimmel commented 7 years ago

Permission from contributors to license this repository as CC0

@clairemcleod & @stephenshank (the two other people with contributions to this repository): are you okay with releasing your past work under CC0 1.0 (the public domain dedication) as well as the current BSD 3-Clause license?

clairemcleod commented 7 years ago

@dhimmel fine by me!

stephenshank commented 7 years ago

@dhimmel Sounds good to me!