fair-research / bdbag

Big Data Bag Utilities
https://fair-research.org
Apache License 2.0
49 stars 23 forks source link

Bagit specification version conformance should be configurable. #27

Closed mikedarcy closed 5 years ago

mikedarcy commented 5 years ago

Bagit spec 1.0 introduces some significant changes from 0.97, most notably the restriction that if multiple checksum types are used, every payload file must be listed in every checksum manifest. This makes creating bags that contain references to remote files with mixed checksum types impossible if bdbag was to only support bagit spec 1.0 moving forward.

For example, when creating bags using a remote-file-manifest, for legacy reasons only an MD5 might be available for a subset of files, whereas others may have newer SHA256 or SHA512 hashes. Creating bags with this type of mixed checksum content is actually a pretty common use case, and one that was supported prior to bdbag release 1.3.0. This issue is described in further detail in #26.

The change proposed here is to create a bdbag.json configuration (and API object) parameter which allows the user to specify the bagit specification conformance level. Specifying 0.97 will allow for the less restrictive payload manifest declarations and specifying 1.0 will enforce strict payload manifest homogeneity. The system will default to 0.97 for backward compatibility. This mechanism will also provide a way to address additional compatibility issues in the future, should they arise.

carlkesselman commented 5 years ago

We could also comment on the draft perhaps talk to John

Carl

Sent from my iPhone

On Jul 12, 2018, at 3:08 AM, mikedarcy notifications@github.com<mailto:notifications@github.com> wrote:

Bagit spec 1.0 introduces some significant changes from 0.97, most notably the restriction that if multiple checksum types are used, every payload file must be listed in every checksum manifest. This makes creating bags that contain references to remote files with mixed checksum types impossible if bdbag was to only support bagit spec 1.0 moving forward.

For example, when creating bags using a remote-file-manifest, for legacy reasons only an MD5 might be available for a subset of files, whereas others may have newer SHA256 or SHA512 hashes. Creating bags with this type of mixed checksum content is actually a pretty common use case, and one that was supported prior to bdbag release 1.3.0. This issue is described in further detail in #26https://github.com/fair-research/bdbag/issues/26.

The change proposed here is to create a bdbag.json configuration (and API object) parameter which allows the user to specify the bagit specification conformance level. Specifying 0.97 will allow for the less restrictive payload manifest declarations and specifying 1.0 will enforce strict payload manifest homogeneity. The system will default to 0.97 for backward compatibility. This mechanism will also provide a way to address additional compatibility issues in the future, should they arise.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/fair-research/bdbag/issues/27, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADbjXr69hcZoCpqfM9YDyokwhizRjyyBks5uFqGigaJpZM4VMD6D.

mikedarcy commented 5 years ago

Fixed in 1.5.0 release.