duke-libraries / dul-hydra

Duke Digital Repository Administrative Hydra Head
BSD 3-Clause "New" or "Revised" License
14 stars 1 forks source link

checksum alogrithms on object creation #966

Closed laissezfarrell closed 10 years ago

laissezfarrell commented 10 years ago

Craig Breaden and I both calculate checksums for objects that move through our respective workflows. The feature that allows us to submit our checksum along with the object at the point of submission fits into our workflow nicely (we had a conversation today, even, about our management of checksum files for objects pre-repository and post-repository).

I default to sha-256, although if I'm calculating fixity on a hard drive disk image I change to md5 in order to make the acquisitions process more efficient. Craig often deals with large digital video files and usually calculates md5 checksums. We've found that calculating sha-256 on files over a certain size is nearly impossible, either because of the time it takes to calculate or because the checksumming application fails.

It would be nice for the object creation form to have the optional checksum field allow for multiple algorithms (via dropdown, radio button, etc.).

dchandekstark commented 10 years ago

@laissezfarrell Would this option apply only to the individual object ingest process or also to batch ingest? The batch ingest process currently uses a specially formatted checksum file, so would require mods to allow for different algorithms there. The single ingest option should be a snap though. Also, would you want/need this before the next major release (2.1)? Just need to know whether to do a hotfix or just add to our development branch.

laissezfarrell commented 10 years ago

Ideally (pie-in-sky)? I think we'd like to be able to submit checksums along with a batch at submission, so the ingest process could tell us if what we submitted matches what we thought we submitted. This could be done by submitting a file of checksums (I maintain such a file for each logical grouping of objects) instead of copy/pasting individual checksums into a box. We can talk about this further for batch.

Single object ingest I don't think is a need prior to 2.1, so adding to the roadmap sounds fine to me.

On Tue, Jul 1, 2014 at 10:54 AM, David Chandek-Stark < notifications@github.com> wrote:

@laissezfarrell https://github.com/laissezfarrell Would this option apply only to the individual object ingest process or also to batch ingest? The batch ingest process currently uses a specially formatted checksum file, so would require mods to allow for different algorithms there. The single ingest option should be a snap though. Also, would you want/need this before the next major release (2.1)? Just need to know whether to do a hotfix or just add to our development branch.

— Reply to this email directly or view it on GitHub https://github.com/duke-libraries/dul-hydra/issues/966#issuecomment-47666176 .

dchandekstark commented 10 years ago

@laissezfarrell @coblej @jt112 Decision is to permit alternate algorithms to be used for ingest checksum verification, but that Fedora will always use SHA-256 for storage. If an alternate algorithm is used, a digest of the Fedora content will be generated by Ruby code and used for validation.