CC-BY Rationale - Githubissues

strasser commented 10 years ago

How are you deciding on CC-BY when copyright doesn’t apply to most datasets? That doesn’t seem very legally defensible.

strasser commented 10 years ago

We arrived at this based on conversations with our copyright expert here at CDL, Katie Fortney. We were considering the Open Data Commons licenses, however Tim Volmer from CC had this perspective:

The Open Data Commons licenses were created several years ago, sponsored by the Open Knowledge Foundation. They are for databases. The arose in part because the CC licenses at that time did not handle database rights well. You can go down the rabbit hole if you wish (http://wiki.creativecommons.org/4.0/Sui_generis_database_rights). Those problems will be fixed in the soon-to-be-released CC 4.0 license suite, which will license database rights alongside copyright (where those database rights actually exist--primarily in Europe). One significant difference between the ODC licenses and the CC licenses is that the ODC licenses also operate as a contract, even where copyright or database rights do not attach to the content. We think this is problematic because it essentially attaches conditions where permission is not otherwise required. ODC also has created this tool called the Public Domain Dedication and License (PDDL). It is for all intents and purposes identical to the CC0 Public Domain Dedication, in that it waives all copyright and related rights and puts the content/data into the public domain. At CC, we try to help educate users about the various options we provide, and guiding them to adopt a tool that is suitable to their needs. And for some areas, like science and public sector information, we urge data publishers to use CC0 or a very liberal license.

Katie's response was this:

Very interesting, thanks. Between this and PLOS's semi-adoption of CC, I'm becoming more and more convinced that CC is a better way to go than ODC now that the new licenses cover database rights.

We also figured Dryad and figshare were going this route, so it would appropriate.

strasser commented 10 years ago

Response from UC Librarian:

I get how you got to this decision, but I'd like to advocate for revisiting this before it's a done deal... I just checked with tvol to make sure I wasn't missing something and he confirmed that CC's 4.0 licenses still don't apply to data since the U.S. doesn't consider (most) data copyrightable and lacks data protection laws. The 4.0 changes are meant to make CC easier to use in countries where data rights exist (and where ODC licenses have historically been used). PLoS probably went with CC-BY to unify their materials under one license, most of which are copyrighted research articles. As counterexamples, both Figshare and Dryad use CC0 for data. Could you consider CC0 plus customized terms for sharing? I worry that by choosing CC-BY you're implying legal protection that doesn't exist in the U.S...

MeganLaurance commented 10 years ago

Chiming in from UCSF here. Right now the Data Use Agreement we have in place is pretty reassuring to data sharers, and up until the proposed switch to CC-BY we had been talking with our prospective users about customizing data use agreements to ensure proper use of their data, ensure that they contact the data contributor before using etc. With a couple of exceptions, most of the people we've talked to about making their data publicly available are still a little skittish about it, and are looking to the Data Use Agreement as a way to alleviate their fears. Maybe what might be useful at this point - since I'm not an expert in the CC licenses - would be a discussion of the differences between what we are proposing to implement - CC-BY - and the Data Use Agreement we have now? That way I can better understand the impact on data sharers and data consumers.

strasser commented 10 years ago

I went back and checked out an old blog post of mine, which suggests that individual, unique data use agreements are not ideal for data.

...While waiting for a consensus on how to properly govern digital data and other digital content, many data providers are dealing with governance by constructing data usage agreements. These are contracts created by lawyers for a specific data provider (e.g., an online database). The problem with data usage agreements is that they are all different. This means that if you want to use data from a source that requires you agree to their terms, you have three options:

Carefully read the terms before agreeing (and who does that?)

Click that you agree without reading and hope you don’t accidentally break any rules

Find the data that you need from another source that doesn’t have terms and conditions for data usage.

Item three points to one of the serious downsides to data usage agreements: researchers may avoid using data if don’t understand the terms of use. Furthermore, the terms only apply to the party that agreed to the contract (i.e. checked the box). If they (potentially illegally) share those data with someone else, that someone else is not bound by the terms.

strasser commented 10 years ago

This is worth a read for all: "Why does Dryad use CC0?" - Blog post from 2011. Some significant excerpts below:

In most cases, CC0 does not actually affect the legal status of the data, since facts in and of themselves are not eligible for copyright in most countries (e.g. see this commentary from Bitlaw regarding U.S. copyright law). But where they are, CC0 waives copyright and related rights to the extent permitted by law.

And this section on why attribution isn't a great idea:

Dryad’s policy ultimately follows the recommendations of Science Commons, which discourage researchers from presuming copyright and using licenses that include “attribution” and “share-alike” conditions for scientific data.

Both of these conditions can put legitimate users in awkward positions. First, specifying how “attribution” must be carried out may put a user at odds with accepted citation practice: From Science Commons Database Protocol FAQ: “when you federate a query from 50,000 databases (not now, perhaps, but definitely within the 70-year duration of copyright!) will you be liable to a lawsuit if you don’t formally attribute all 50,000 owners?”

And finally:

“… given the potential for significantly negative unintended consequences of using copyright, the size of the public domain, and the power of norms inside science, we believe that copyright licenses and contractual restrictions are simply the wrong tool [for data], even if those licenses and contracts are used with the best of intentions.” (Science Commons Database Protocol FAQ)

strasser commented 10 years ago

We have plans to meet with people from the Research Policy Analysis and Coordination group at UCOP. We also plan to discuss this with Dash campus stakeholders who attend our governance and policies meeting later this summer.

In the interim, we've decided to implement CC-0 for datasets in Dash. This is the least complicated option and doesn't potentially (and wrongly) indicate to users that the data are copyrightable.

strasser commented 10 years ago

Report from the meeting with UCOP: Current policy is that all data produced in the UC system is owned by the Regents (although the data guidelines/policies are in flux right now). As such, researchers are not able to waive all rights via a CC-0 waiver. CC-BY is also potentially problematic for various reasons, but in the OP legal council is comfortable with us using CC-BY until the UC-wide data guidelines are released, and/or until they have a chance to better understand the issues. The lawyers did note, however, the CC-BY-4.0 has sufficient language to make it useful for data. This is promising for Dash offering a machine-readable agreement to data users.

Given the regents lay claim to the data, there was a question of who should receive the attribution. If it's owned by the regents, does that mean they get the By-line? The lawyers plan discuss this further internally and report back.

cachatj commented 10 years ago

@strasser - could you please provide a link to the UC policy they are referencing in saying that "UC system owns all data produced"?

From what I have read (and could be wrong), it appears that in the US "data" is not copyrightable in the first place - which is a question underlying this whole discussion.

At the end of the day, all of this complicated mess, concern and hold up by UCOP seems to be extremely antithetical to the very mission and purpose of a university in the first place. How can it be conceived that a researcher sharing their data with the public, at their own discretion, is improper or even illegal in the eyes of UC?? it seems to me that they would stand to benefit much more by allowing their researchers to lead the way in open science, open data etc….

It would also be very, very weird in this time of AltMetrics if all data deposited by UC researchers was actually attributed to the UC system, vs the researcher.

pcruse commented 10 years ago

The policy comes from the Academic Personnel Manual -- http://www.ucop.edu/academic-personnel/academic-personnel-policy/index.html. The specific section of the manual is APM 020 and states "Notebooks and other original records of the research are the property of the University." This is dated from 1958 so hence the reason that this is being revisited. Times have changed and what made sense in 1958 might not make sense in today's data driven scholarship. Not to get too far into the weeds on whether data are copyrightable, but UMich has a some nice guidance on this http://www.lib.umich.edu/copyright/facts-and-data -- it is a slippery slope.

dhimmel commented 7 years ago

Just stumbled upon this Issue from Google. I wanted to provide a few helpful links:

Recent blog post by the aforementioned Katie Fortney: Who “owns” your data?.
Another blog post by Katie: CC BY and data: Not always a good fit.
https://github.com/OBOFoundry/OBOFoundry.github.io/issues/285 for the practical differences of CC0 versus other CC licenses with respect to biomedical ontologies.
https://github.com/cognoma/cancer-data/issues/5 for an example of how to dual license a repository with code and other content such as prose, figures, data.

CDLUC3 / dash

CC-BY Rationale #19