NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
25 stars 12 forks source link

refactor getPackage to conform to RDA recommendation #1262

Open mbjones opened 6 years ago

mbjones commented 6 years ago

RDA has recently finalized a recommendation for how to interoperably ship data packages between repositories. The specification is here:

http://dx.doi.org/10.15497/RDA00025

But it basically involves using BagIt to capture all of the data files and metadata, and is extremely similar to the DataONE Bagit approach. They even give the inclusion of ProvONE metadata as an example in the spec.

The main changes upon a quick read would be:

1) Move the ORE file from data/ to a new metadata/ directory 2) Move the science metadata file to the metadata/ directory 3) Add system metadata for every object to metadata/ directory (see #1261) 4) Add datacite.xml metadata to metadata/ directory

We'll need to read the spec in more detail, but I think that's the main set of changes.

amoeba commented 5 years ago

Just dropping a note in here since I heard from a user about buggy behavior in getPackage today where (1) the zip was called zip.zip and the files inside had their PID set as their filename when each Object in the package had a fileName set in its System Metadata. I figure bugs like that could get handled in this ticket.

The package exhibiting the behavior is https://search.dataone.org/view/780dfe8b-4179-4acf-b296-720020ac16c2

ThomasThelen commented 2 years ago

The hierarchical package work satisfies most of these requirements:

  1. The ORE file has been moved from data/ to the metadata/ directory
  2. The science metadata file moved to the metadata/ directory
  3. System metadata is included for every file, in a subfoolder in metadata

What's missing is datacite.xml, which MUST be present for a valid RDA conforming bag