IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

License: Multiple Options for Licensing #1753

Closed eaquigley closed 2 years ago

eaquigley commented 9 years ago

Explore other licenses to add to the License + Terms tab in addition to CC0.

mercecrosas commented 9 years ago

When we do this, we should also make more clear the difference between choosing a license and entering custom terms of use (with the terms of use, users need to click through before downloading).

pdurbin commented 7 years ago

I believe @Venki18 is interested having "CC BY" as the default license rather than "CC0".

I originally left this comment at https://github.com/IQSS/dataverse/pull/3082#issuecomment-244438589 but thank you to @mheppler for mentioning that there's a dedicated issue for this!

tdilauro commented 7 years ago

JHU Data Archive would also be interested in the ability to set CC BY as the default.

pdurbin commented 7 years ago

As I mentioned at https://github.com/IQSS/dataverse/issues/3618#issuecomment-277093371 this is where CC0 is hard coded: https://github.com/IQSS/dataverse/blob/v4.6/src/main/java/edu/harvard/iq/dataverse/TermsOfUseAndAccess.java#L282

Thalia-Uranga commented 6 years ago

I'm interested in the subject, about attaching creative commons licenses to dataverse.

pdurbin commented 6 years ago

I interpret https://twitter.com/ruebot/status/938861697766477827 to mean that defaulting to CC0 is a nasty surprise to some users:

"While I'm very sympathetic to it, it really bugs me that OCUL's Dataverse still defaults a submission to Public Domain, and it's not obvious that it does it unless you're really looking."

jggautier commented 6 years ago

During the 2018 Dataverse Community Meeting, Obiajulu from DataverseNO said DataverseNO is interested in more licenses.

Zenodo shows a dropdown box with licenses it fetches from http://opendefinition.org as you type in their deposit form's text box:

screen shot 2018-06-14 at 2 19 40 pm
pdurbin commented 6 years ago

Related: #1990

philippconzett commented 6 years ago

The EUDAT License Selector might be of interest here: https://www.eudat.eu/services/userdoc/license-selector

Screen Shot 2020-02-05 at 10 36 12 AM
pdurbin commented 6 years ago

@philippconzett interesting. Thanks. Since the first question is about if data or software is being deposited, it seems relevant to #2739 (code deposit) as well.

shlake commented 5 years ago

I can't find an issue about this, but has anyone thought about having different "terms" or licenses for different files in a dataset.

Setting license on a dataset assumes ALL files have the same license, and if not they should be in a different dataset?

mheppler commented 5 years ago

@shlake Not sure if these issues cover the exact same feature request, but they are similar.

I believe those issue are specific to displaying what terms are applied to a file on the file pg, as opposed to offering different terms. Both feature requests would require the addition of a "Terms" tab, or the equivalent, to the file pg.

pdurbin commented 5 years ago

I don't have a strong opinion (this morning anyway) about how the GitHub issues are organized but I'd like to point out that @adam3smith @janetm @mercecrosas and @jonc1438 talked about the need for terms/licenses per file over at https://groups.google.com/d/msg/dataverse-community/FJaVCuVzkKM/Y7w1oqzKAAAJ

Maybe @shlake could update the title of #4391 to be more specific? Or maybe a dedicated issue (small chunks!) is warranted? I'm fine with whatever.

pkiraly commented 5 years ago

Our users would like to see the option of selecting from a list of well known licences, such as what Zenodo does (mentioned in https://github.com/IQSS/dataverse/issues/1753#issuecomment-397393706).

shlake commented 5 years ago

I do think my issue #4391 and this issue about multiple licenses are similar, if not the same. Maybe this issue is about having individual licenses for files and #4391 is about the display (if each file has its own license, then each file would need a "Terms" tab to choose).

Adding the following from the twitter discussion https://twitter.com/GreeneScientist/status/1136706935535886336 on why each file should have its own license:

Here’s the work proposing licensing through the Reproducible Research Standard https://ieeexplore.ieee.org/document/4720221 and https://web.stanford.edu/~vcs/papers/ijclp-STODDEN-2009.pdf They explain how it’s essential to license different components (eg data/code/text) with appropriate licenses for legal reasons

pdurbin commented 4 years ago

Related: Using other waiver that CC0 #6436

BPeuch commented 4 years ago

We at the State Archives of Belgium strongly second this. We think that CC-BY really is the ideal license for many researchers and a whole lot of datasets, statistically speaking.

pdurbin commented 4 years ago

On Friday there was a meeting Brussels of six installations of Dataverse and having CC-BY and other Creative Commons licenses as options in Dataverse was a hot topic.

As I mentioned in the notes, I told everyone that I suggested recently in sprint planning that perhaps this issue could we worked on separately ("small chunk" style) from a complete redesign of the dataset landing page, especially given how much demand there is. Here are the notes: https://docs.google.com/document/d/1I44zKruk_vBBfvFEIZ2XBYLhZrmb4Xxbgg1I1XF3NL8/edit?usp=sharing

@philippconzett had an interesting comment that I think is worth sharing here:

"Just had a look at the notes. I don't understand the reluctance against CC0. See e.g. OpenAIRE on this: https://openaire.eu/how-to-openly-license-research-data. For most types of open data, CC0 is recommended because it enables maximum reuse of your data, and you avoid attribution stacking; see the DCC guide." https://twitter.com/PhilippConzett/status/1223323273020178434

This concept of attribution stacking is new to me. So perhaps if Dataverse offers a CC-BY option, we should warn about attribution stacking.

BPeuch commented 4 years ago

Thanks a lot for the update and for mentioning our event, @pdurbin!

That is true: attribution stacking is a concern. And I think it's very wise to consider such issues especially now that the open data movement has gained such momentum, now that the idea of giving IDs to things like conferences and presentations is being considered… and so, we have to consider the long-term implications of things we take for granted like identifying (i.e. assigning PUIDs) to digital objects.

All that being said, to quote you Philip, citation is now the "currency" for researchers. They need it, if they are to hope to make a career. Hence the reluctance against CC0, quite simply. We know researchers are anxious not to be "robbed" of their data. When you think about it, requiring citation is not just an incentive for them to deposit their dataset: it's really just acknowledging their work.

poikilotherm commented 4 years ago

Disclaimer: I am not an attorney at law nor an expert. I just read things written by experts.

From my point of view, offering CC-BY makes scientists more comfortable that people are obliged to take action according to good scientific practice. At least in german IP law it is crystal clear that pure facts or values are not covered by IP law in any way - they are already public domain and actually cannot be licensed with any kind of license (which is why CC-0 is appropriate).

While I see that there might be a problem arising from attribution stacking, on the other hand I see the difficulties of changing mindsets towards sharing. This is much easier with CC-BY than CC-0 IMHO. As @BPeuch already outlined: I am totally with every scientist that is about to share their data, but wants credit.

What I don't know and what no guide I read so far could tell me: wrapping up raw data, results and metadata into a dataset, at what point does this turn into sth. under database / collection IP rights? Granting a usage license with CC-BY would be perfectly fine then, also stacking still might be a problem.

And for a plot twist: if you license your data under CC-BY, someone uses it and does not do proper attribution, you will need to sue them. Then a court can decide whether your CC-BY license does apply or not in the particular case... What happens when IP law changes, as it might happen in the future in EU?

And just another things popping up: this attribution hell is sth similar to software licenses. That's why things as SPDX exist. If you cite all data you re-use, stacking attributions are like a tree you can follow with an algorithm, collecting everyone to credit along the way.

mercecrosas commented 4 years ago

I think a user-friendly License Selector (Eudat style) and perhaps also a configurable default per Dataverse are the way to go.

There are good arguments for choosing CC-BY for datasets and good arguments for choosing CC-0 or other waivers/licenses. So, it makes sense to make it flexible with a selector and configurable default.

Merce

-- Mercè Crosas, Ph.D. University Research Data Officer, HUIT | Chief Data Science and Technology Officer, IQSS Harvard University mcrosas@g.harvard.edu | @mercecrosas https://twitter.com/mercecrosas | scholar.harvard.edu/mercecrosas

On Wed, Feb 5, 2020 at 10:04 AM Oliver Bertuch notifications@github.com wrote:

Disclaimer: I am not an attorney at law nor an expert. I just read things written by experts.

From my point of view, offering CC-BY makes scientists more comfortable that people are obliged to take action according to good scientific practice. At least in german IP law it is crystal clear that pure facts or values are not covered by IP law in any way - they are already public domain and actually cannot be licensed with any kind of license (which is why CC-0 is appropriate).

While I see that there might be a problem arising from attribution stacking, on the other hand I see the difficulties of changing mindsets towards sharing. This is much easier with CC-BY than CC-0 IMHO. As @BPeuch https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BPeuch&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=uLgicDFcqRCop5BqNITNnK4dFWW_pcoY0D9OgIGquB4&s=LpL6wYB2rppIUL-tpfRqxhLtd6o4At6-MiU_X92Tyvk&e= already outlined: I am totally with every scientist that is about to share their data, but wants credit.

What I don't know and what no guide I read so far could tell me: wrapping up raw data, results and metadata into a dataset, at what point does this turn into sth. under database / collection IP rights? Granting a usage license with CC-BY would be perfectly fine then, also stacking still might be a problem.

And for a plot twist: if you license your data under CC-BY, someone uses it and does not do proper attribution, you will need to sue them. Then a court can decide whether your CC-BY license does apply or not in the particular case... What happens when IP law changes, as it might happen in the future in EU?

And just another things popping up: this attribution hell is sth similar to software licenses. That's why things as SPDX https://urldefense.proofpoint.com/v2/url?u=https-3A__spdx.org_&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=uLgicDFcqRCop5BqNITNnK4dFWW_pcoY0D9OgIGquB4&s=IakmDv65FxefVHymK5RUTK1jo3nrG3rVxdJNPwa9028&e= exist. If you cite all data you re-use, stacking attributions are like a tree you can follow with an algorithm, collecting everyone to credit along the way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_1753-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAFFBSHH647727BXBH24QULRBLIPLA5CNFSM4A6NIFTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK3XXZY-23issuecomment-2D582450151&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=uLgicDFcqRCop5BqNITNnK4dFWW_pcoY0D9OgIGquB4&s=rG4H9EIxhC0ucaTlTXq2Mfz5hoz3B-a1_j9qzGVgAlY&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAFFBSD765NQLUWIPCKMSSLRBLIPLANCNFSM4A6NIFTA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=uLgicDFcqRCop5BqNITNnK4dFWW_pcoY0D9OgIGquB4&s=auMD34uWf1uD1-Ols4D74D8eAHvDaOkRVHInrdyEiVk&e= .

poikilotherm commented 4 years ago

Let me show you the EUDAT style license selector in B2SHARE mentioned by @mercecrosas :

eudat-license-selector

pdurbin commented 4 years ago

@BPeuch @poikilotherm here's the video mentioned in chat today, starting at 26:00 (ending around 30:55): https://videocast.nih.gov/watch=35859

28:40 is where the part about CC0 vs CC-BY starts. "The ability to share the data falls apart" (with CC-BY 4.0). Then she references another video at https://www.youtube.com/watch?v=5Ng5FDW1LXk which I didn't watch and with comes from https://sched.co/Xri6

To be honest, I don't know what to make of all this. I'm just passing it along since you and others in Brussels were so passionate about licenses. Enjoy! 😄 If you watch the longer video (90 minutes!), please let me know what you think.

On a positive note, the speaker talks about the culture of attribution in science. This is a good thing!

poikilotherm commented 4 years ago

OK the question Shelley Stall mentions in the NIH talk (start at 27:50) is from herself and starts at 1:13:00. The real answer to that starts at 1:23:29

The gist from it: CC-BY 4.0 does not allow sublicensing, so an archive cannot distribute the data under another license. This doesn't necessarily mean we're in bad trouble, but it's a noteable point.

At least we should do proper information of users. I will try to check with our law council about this issue and try to provide more info.

mercecrosas commented 4 years ago

Thanks, Oliver. Let us know what you find out. I'm also trying to learn more about this with some legal experts at Harvard.

Merce

-- Mercè Crosas, Ph.D. University Research Data Management Officer Chief Data Science and Technology Officer, Institute for Quantitative Social Science Harvard University mcrosas@g.harvard.edu | @mercecrosas https://twitter.com/mercecrosas | scholar.harvard.edu/mercecrosas

On Thu, Feb 27, 2020 at 10:49 AM Oliver Bertuch notifications@github.com wrote:

OK the question Shelley Stall mentions in the NIH talk (start at 27:50) is from herself and starts at 1:13:00 https://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_5Ng5FDW1LXk-3Ft-3D4381&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=NU6gNrJ9K3OXyVOXvTKURVR8f9TKroWh_iaXIsG2p2E&s=xvOO4ydyM71h5hMBOFicqQNZ8iLf61-P0Jy1yVsPY1w&e= The real answer to that starts at 1:23:29 https://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_5Ng5FDW1LXk-3Ft-3D5009&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=NU6gNrJ9K3OXyVOXvTKURVR8f9TKroWh_iaXIsG2p2E&s=UPzdcT7tpzfetkGILoAtlCMkYCbs0D6_U-XMKqwEhCo&e=

The gist from it: CC-BY 4.0 does not allow sublicensing, so an archive cannot distribute the data under another license. This doesn't necessarily mean we're in bad trouble, but it's a noteable point.

At least we should do proper information of users. I will try to check with our law council about this issue and try to provide more info.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_1753-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAFFBSB237ULOLEUW7GF5ITRE7OHHA5CNFSM4A6NIFTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENE2OLI-23issuecomment-2D592029485&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=NU6gNrJ9K3OXyVOXvTKURVR8f9TKroWh_iaXIsG2p2E&s=wlPRyErJaHRFMfMEDX-mYApjxlylncEazxXccIg3FQo&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAFFBSERESLYH5E47SF4BCDRE7OHHANCNFSM4A6NIFTA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=NU6gNrJ9K3OXyVOXvTKURVR8f9TKroWh_iaXIsG2p2E&s=bIIFpxM9lv94Lb3zY2LkQQDTA-XifDYfosfE7sXYPuM&e= .

BPeuch commented 4 years ago

Thank you @pdurbin and thanks a lot @poikilotherm! We will also look into this with care.

poikilotherm commented 4 years ago

Please note that I created a very minimal patch to Dataverse to enable CC-BY 4.0 as a license option. We have an utterly need and cannot wait until upstream is ready. I also added links to the license text, which is a common best practice for Creative Commons.

grafik

grafik

I am happy to share the patch to anyone interested - most likely it will not be accepted upstream by @TaniaSchlatter et al. (Beware - the patch is against 4.19 for now, but it should not conflict with 4.20.)

djbrooke commented 4 years ago

Thanks @poikilotherm for linking to the patch. Other institutions may have the same need and may decide to go that route.

poikilotherm commented 4 years ago

With pleasure @djbrooke :smile:

poikilotherm commented 4 years ago

Today I learned about the existance of https://github.com/spdx/license-list-data which looks like a vault full of curated and validated license data, ready to be reused for a license chooser.

samuel-rosa commented 4 years ago

On Friday there was a meeting Brussels of six installations of Dataverse and having CC-BY and other Creative Commons licenses as options in Dataverse was a hot topic.

As I mentioned in the notes, I told everyone that I suggested recently in sprint planning that perhaps this issue could we worked on separately ("small chunk" style) from a complete redesign of the dataset landing page, especially given how much demand there is. Here are the notes: https://docs.google.com/document/d/1I44zKruk_vBBfvFEIZ2XBYLhZrmb4Xxbgg1I1XF3NL8/edit?usp=sharing

@philippconzett had an interesting comment that I think is worth sharing here:

"Just had a look at the notes. I don't understand the reluctance against CC0. See e.g. OpenAIRE on this: https://openaire.eu/how-to-openly-license-research-data. For most types of open data, CC0 is recommended because it enables maximum reuse of your data, and you avoid attribution stacking; see the DCC guide." https://twitter.com/PhilippConzett/status/1223323273020178434

This concept of attribution stacking is new to me. So perhaps if Dataverse offers a CC-BY option, we should warn about attribution stacking.

I suggest the help text be as follows:

"Datasets will default to a CC0 public domain dedication. CC0 facilitates the reuse and extensibility of research data and helps to avoid citation stacking. Our Community Norms as well as good scientific practices expect that proper credit is given via citation. If you are unable to give datasets a CC0 waiver you may enter custom Terms of Use for datasets."

For more info on citation stacking, a link to https://retractionwatch.com/?s=citation+stacking could be added.

philippconzett commented 4 years ago

I think such an addition could provide some more background information about why CC0 is a good license to use whenever possible. As for the more info link you provided, I think the attribution stacking discussed in this blog post is of another kind than the kind of attribution stacking that is problematic for research data. The blog post discusses the problem of journal editors asking authors to cite particular papers. This is not what (usually) is referred to when we talk about the problem of attribution stacking in licensing research data. For explaining this kind of attribution stacking, I'd rather refer to the guide I linked to in my reply on Twitter, i.e.

Alex Ball (DCC): How to License Research Data, page 4 (available at https://www.dcc.ac.uk/sites/default/files/documents/publications/reports/guides/How_To_License_Research_Data.pdf):

Datasets are particularly prone to attribution stacking, where a derivative work must acknowledge all contributors to each work from which it is derived, no matter how distantly. If a dataset is at the end of a long chain of derivations, or if large teams of contributors were involved, the list of credits might well be considered too unwieldy. The problem is magnified if different sets of contributors have to be credited in a different way, especially if automated methods are used to assemble the dataset – some of the benefits of automation are lost if attribution conditions have to be inspected manually.

TaniaSchlatter commented 3 years ago

The image posted in April of the patch looks like what we have discussed, @poikilotherm.

poikilotherm commented 2 years ago

Looks like #7440 just went through :partying_face: Maybe this can be closed then @pdurbin?

pdurbin commented 2 years ago

Yes, the following pull request implements this. So I'll close this issue.

Check out the multiple license demo video (password "community2022!"). It's also on https://dataverse.org/dataversetv

See also #8347 for some possible future work.