IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
877 stars 486 forks source link

Access Rights metadata in OpenAIRE metadata export is being misapplied #5920

Open jggautier opened 5 years ago

jggautier commented 5 years ago

As part of v4.14 (released in May 2019), Dataverse makes available through the UI, API and over OAI-PMH DataCite metadata that complies with OpenAIRE requirements (https://github.com/IQSS/dataverse/issues/4257). Repositories need to follow these requirements in order for their dataset metadata to be made discoverable in OpenAIRE EXPLORE.

The required metadata export called OpenAIRE (in the Dataverse UI) or oai_datacite (over API and OAI-PMH) includes one of four Access Rights terms, which come from the info:eu-repo-Access-Terms vocabulary:

Dataverse chooses these terms based on whether or not any dataset files are set to restricted and whether or not people are able to request access to those restricted files using Dataverse's request access feature:

There are datasets in Dataverse repositories whose files are set to restricted, and people cannot request access through Dataverse's request access feature. The OpenAIRE metadata export for these datasets uses closedAccess, even when the dataset metadata indicates that people can request access by some process that happens outside of Dataverse's request access feature, e.g. submitting a DUA or contacting the author.   Untitled-1 This dataset has restricted files and people aren't able to request access through Dataverse's request access feature, so its OpenAIRE metadata indicates that the dataset is closed access. But people are able to request access by filling out a form (Application For The Use of Data), so the dataset isn't really closed access.

  When these datasets are harvested by OpenAIRE, because the metadata says they're closedAccess they'll appear and be searchable as closedAccess, grouped with datasets that are more appropriately labelled closedAccess, even though file access is only restricted. This may make these datasets harder to find and use, making OpenAIRE EXPLORE less effective for finding datasets published by Dataverse repositories.

We can think of better ways for Dataverse to assign rights access terms in ways that the Dataverse community thinks are more appropriate (e.g. Zenodo depositors choose from a drop-down menu). But other data publishers are using these rights access terms (or those terms are being applied to the harvested datasets) in a variety of ways that can make using the Access Rights filters unhelpful for searching through OpenAIRE EXPLORE. "Open data" already means many different things to different groups. Since these Access Rights terms are used for the benefit of finding data in OpenAIRE EXPLORE, the scope of this issue might involve learning how OpenAIRE might want to improve the definitions and how repositories can use them in more standardized ways.

jggautier commented 5 years ago

I wonder if it might be safe to never use "Closed Access", use "Restricted Access" for datasets that have restricted files, and use "Open Access" for all other datasets. Does anyone ever publish datasets whose files can't be accessed at all?

If so, it might help if Dataverse allows depositors to indicate, in a standardized and machine-readable way, that access to restricted files can be requested (even if people need to request access outside of Dataverse's request access feature) or cannot be requested through any means

cmbz commented 1 month ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

philippconzett commented 1 month ago

I only recently came aware of this issue. I think resolving this issue eventually depends on #4391 being resolved first. Thus, to me, it seems the Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld) depend on Terms of Use being defined at file-level.

With support for file-level Terms of Use being implemented, I think things would work like this: At the metadata record level, thus the registered metadata at dataset or file-level should always be licensed with CC0 and thus have the Access Rights terms defined as "Open access". At file-level, all of the values can be used, based on the Terms of Use of the individual file at stake:

jggautier commented 2 weeks ago

@pdurbin and I talked about this issue in relation to https://github.com/IQSS/dataverse/pull/10737 and https://github.com/IQSS/dataverse/issues/8129. And I agreed that I'd open a new GitHub issue about dc:rights specifically, to help manage these different goals and scopes.

But @philippconzett, what do you think of using this GitHub issue instead, since we're already talking about the use of these "Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld)"?

I could re-word this GitHub issue's title so it's clear that the issue is about all uses of these "Access Rights" terms, and edit the first comment for the same reason.

pdurbin commented 2 weeks ago

I wanted to link to something so I went ahead with the idea that this issue represents the unfinished dc:rights work that was originally part of the scope of #8129, which (if all goes will) will be closed by PR #10737.

The next challenge will be to size it, of course, and figure out what the plan is and when. 😅

philippconzett commented 2 weeks ago

@jggautier @pdurbin Thanks for moving this forward. I think both approaches could work, thus continuing using this issue or creating a new one.

jggautier commented 2 weeks ago

Thanks. https://github.com/IQSS/dataverse/issues/4176 is also about changes to what's included in dc:rights and we'll need to consider the points raised there, too.

Next week I'll try to find time to help think about either using this GitHub issue or creating a new one, but with other projects and work travel next week, I'm not sure. I definitely don't have time this week.