IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

Deaccession Dataset - Delete Files but Keep Tombstone Metadata Record #8037

Open AliciaVC99 opened 3 years ago

AliciaVC99 commented 3 years ago

Currently, a user or administrator can deaccession a dataset, make the files publicly inaccessible, and create a tombstone metadata record. However, in this situation, the files from that dataset are not deleted, they're still available to administrators or dataset creators/owners. In other words, the storage associated with those files is still in use.

A Super User can use an API call to delete a published dataset, but this also deletes the tombstone metadata record and essentially removes all evidence of the dataset ever existing.

In planning for long-term preservation, repositories may wish to select some datasets or Dataverse collections for deletion after a certain period of time (e.g., 10 years) as per their Collections Development policies. Others may wish to move the data to long-term storage via digital preservation processes (e.g., Archivematica). In either case, there may be a need for the repository to delete the files being stored in their Dataverse repository but keep either the active or tombstone metadata record.

Deletion of files (but not metadata records) from Dataverse may become more of an issue when and if repositories encounter storage space limitations or unmanageable expenses. However, the purpose of collections development and long-term preservation selection means that not all datasets are worth keeping over the long term and may require deletion. In these situations, a metadata record that not only proves the dataset once existed but allows the PID to 'persist,' is vital.

pdurbin commented 2 years ago

@AliciaVC99 thanks for opening this issue (over a year ago!). You might be interested in the following issue, which is related:

You seem to be coming at this from the perspective of someone who is concerned about storage costs and adhering to preservation policy time periods (10 years, etc.). You're someone who runs a Dataverse installation, it seems.

The other issue is a little more about users/researchers (or maybe curators) who want to delete files. Their reasons may vary. Maybe the data is too sensitive. Who knows.

For both issues, it seems like you want files to be able to be deleted from deaccessioned versions. Optionally, I suppose. 😄

Thanks for the feedback!

meghangoodchild commented 11 months ago

Just adding a +1 to this feature request. Another use-case is for sensitive data that is accidentally published. It would be helpful to be able to delete the files from storage and keep the tombstone record. Thanks!