eprintsug / EPrintsArchivematica

Digital Preservation through EPrints-Archivematica Integration - An EPrints export plugin to Archivematica
6 stars 1 forks source link

New proposed spec: use Archivematica transfer structure in place of BagIt #3

Closed tw4l closed 5 years ago

tw4l commented 5 years ago

The spec as described as of commit https://github.com/photomedia/EPrintsArchivematica/commit/66484e1c8b95fb7c756836fab0fb06b96a4ea2ff proposes to use the default Archivematica transfer structure with a checksum manifest as described in the Archivematica docs for exports in place of BagIt.

This change should make it easier to generate the checksum manifest passed to Archivematica from values as they are stored in the EPrints database, rather than generating new checksums from the files on disk after being copied from their original location (which introduces the possibility of bit corruption prior to checksums being generated).

Thoughts?

photomedia commented 5 years ago

I think this updated version is the correct answer! This means that the Checksum MD5 file would export out the existing checksum values in the EPrints database, which is what we need for verification. Q1:The revisions folder contains metadata about the history of revisions stored in a series of consecutive XML files, should this be moved to the "metadata" folder? Can a metadata folder contain a folder with a number of XML files?

Q2.1: What would we do for any files that may not have had a checksum generated and stored in the EPrints database?
Q2.2: Which of the files in the export would be in that situation (not having a checksum stored in EPrints)?

photomedia commented 5 years ago

Q1:The revisions folder contains metadata about the history of revisions stored in a series of consecutive XML files, should this be moved to the "metadata" folder? Can a metadata folder contain a folder with a number of XML files? A: A metadata folder can contain a "revisions" folder with metadata files. With this commit (https://github.com/photomedia/EPrintsArchivematica/commit/57c1ac9b21c4a492d104e4a4ccb835d369527c2a) , I have moved the revisions folder under metadata. I also close this issue, to approve this change to the Archivematica structure as it has a clear advantage over having EPrints generate a bag with new checksums. I move the discussion of missing checksums stored in EPrints to a new issue: https://github.com/photomedia/EPrintsArchivematica/issues/5