NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
42 stars 26 forks source link

prevent dataset disappearing after DOI is assigned #1614

Open mbjones opened 3 years ago

mbjones commented 3 years ago

Describe the feature you'd like

When we assign a DOI to a dataset, we are agreeing to continue to make that dataset available indefinitely. For repos like the Arctic Data Center and the KNB, that can be compromised if the rightsHolder, after having a DOI assigned, edits the dataset and makes components private, or deletes them altogether. I would like to see us protect against this by treating DOIs as special identifier in which once it is assigned, the contributor no longer has the right to delete or make components of the dataset private.

Implementation of this might be a straightforward change in access control rules, such that, when a DOI is used for a new version of a dataset, the rightsHolder for the whole package and its files is changed to an admin group (e.g., arctic-data-admins), and the user is granted write (but not changePermission) on the package and its files. That allows them to create new versions of the package, replace data files with new versions, and keep things updated, but prevents them from making the data files and metadata private.

Is your feature request related to a problem? Please describe.

Data associated with DOIs can disappear, contrary to the spirit of a DOI.

Additional context

An example of this happening is: https://arcticdata.io/catalog/view/doi:10.18739/A29Z90C2X

Presumably, this would be a configurable feature that each deployment of Metacat/MetacatUI could decide whether to use it, and if so, which groups get assigned the permission.

Much of the implementation of this feature might be needed on the Metacat side rather than the MetacatUI side to prevent client-side bypassing of the feature. In particular, a user should not be able to bypass restrictions put in place by using a client like R to manipulate the package. In addition, repositories like the Arctic Data Center that assign DOIs as part of the curation process need this workflow to work as well.

Of course, at times an administrator must be able to remove problematic content, even if it has a DOI. So, admin users should still have the ability to delete objects and make them private. When they do, the admin must ensure that there is still a tombstone landing page for the DOI that explains why the content is missing.

mbjones commented 3 years ago

@laijasmine @jeanetteclark please add your thoughts on this enhancement request...

jeanetteclark commented 3 years ago

@mbjones in your proposed solution, wouldn't a user still be able to just delete a file from the package, thus creating a new resource map but not a new metadata file? This would be possible with the R tools, though I think metacatUI would create a new metadata identifier. Your solution would work for metacatUI I think

laijasmine commented 3 years ago

This sounds fair. Jeanette brings up a good point about the workaround in R tools but I think the people who use it are pretty low. For the Arctic Data Center we should get notifications if the dataset is changed and will follow up but it will be more of an issue in KNB

jeanetteclark commented 3 years ago

Seems to me that this difficulty arises because we assign the DOI to the metadata file, even though the contents of the dataset are determined by the resource map, meaning you can change the contents of the dataset without changing the DOI. Would it make sense to assign DOIs to resource maps instead? I realize that would be a huge change, just wondering if we have ever considered it.

mbjones commented 3 years ago

We have indeed considered using DOIs on the package rather than the metadata. That would be a huge change. LTER and EDI assign their DOIs that way. I've discussed with Mark Servilla the need for an approach like that (https://redmine.dataone.org/issues/8077), but I don't think any action has been taken.

In terms of deleting files, only admins can actually delete files, so I think that is a non-issue. Users can archive files, but they are still accessible if they do so as long as they remain publicly visible.

I agree with your point that removing a file from the package ORE does seem to be an issue, in that anyone with write access could then remove files from the package. So, this requires further discussion. Assigning DOIs to packages would be one solution, but it would be quite a large change for us throughout out infrastructure. I think we'd need to discuss it with @laurenwalker @csjx and others on the team. Let's discuss on our Thursday dev meeting.

jeanetteclark commented 3 years ago

Right, so I guess in theory even if someone changes the ORE to remove a file from a package, since they can't remove files or change permissions on them, anyone who navigated to that DOI could still download the files from the distribution section in the physical section of the metadata. It just wouldn't show up in metacatUI with the file download button at the top. This actually seems acceptable to me and still within the spirit of the DOI, but only because we are so careful about that distribution section.