LibraryOfCongress / bagit-spec

8 stars 7 forks source link

RFC6920 registry for hash algorithm names #23

Closed stain closed 6 years ago

stain commented 6 years ago

Reference Named Information Hash Algorithm Registry for future algorithm names.

I added explicitly MD5/SHA1 legacy algorithms as permitted, as unlike sha-256 and sha-512 these are not in the registry.

Left as an exercise to the reader is how to normalise sha3-512.

justinlittman commented 6 years ago

Hmmm ... what about "sha-256-128"? Does this become "sha256128"?

I'm a little concerned about tying to IANA registry. Any sense on how viable it is?

stain commented 6 years ago

I'm afraid it would become sha256128 by the current normalization rule yes..

We should be more concerned about sha3 I think, as I don't see much point in making tagfiles with shortened checksums. Here the size is always explicit (e.g. sha3-256) so you can't just say sha3.

stain commented 6 years ago

An alternative as we're aiming for an RFC is to make our own IANA registry, that's effectively what we've done now (currently with 4 entries for md5/sha1/sha256/sha512).

But look for "hash" under https://www.ietf.org/assignments/ - why do we need yet another one? RFC6920 registry is quite straight-forward to augment.

However I see great benefits with linking to RFC6920 for bagit - for one you have instantly global identifiers for (the bytes of) every item in a bag, secondly it also has a resolution mechanism that a bagit client could use to fix bags, say if fetch.txt URIs give 404.

justinlittman commented 6 years ago

I'm split on this one. On the one hand, registries are generally a good thing. On the other hand, the existing approach has worked fine to date.

stain commented 6 years ago

Once we go RFC we can't do sporadic updates for adding checksum algorithms, just erratas or new RFCs, so then it would make sense to do it as a new registry so the community can do easy additions to it.

If "algorithm name" is free-form anyway, then there is not much point in normalizing it.

I can have a bag with manifest-Grøstl-512.txt and a human is anyway needed in the loop to say it is compatible with the code that can handle manifest-grostl384.txt in another bag.

However if we say it SHOULD be in the registry, then that should hopefully prompt some emails or at least emerging community consensus about what to use for a new algorithm name, and then the registration can happen, either in the RFC6920 registry (as I suggest) or in our own.

If we do our own registry it could have a similar simple registration procedure as RFC6920, basically "Expert Review" to check it's not rot13, patent-troll or a duplicate. Might be worth checking with the IETF folks at arts@ to see what they think is preferable.. it must have come up for the other hash registries?

stain commented 6 years ago

Thanks, @acdha, fixed all of those.

stain commented 6 years ago

Do we need @acdha to formally re-review before merge..? GitHub insist on "requested changes" even though I believe they are now addressed.

johnscancella commented 6 years ago

I believe you have addressed @acdha comments. Since @justinlittman is in disagreement we should probably have kunze weigh in

justinlittman commented 6 years ago

I'm OK with being the dissenting opinion, but agree that @jkunze should review.

johnscancella commented 6 years ago

@stain do you think you could resolve the conflicts on this PR? Once that is done I think we are good to go with merging it. Thanks!

johnscancella commented 6 years ago

fixed merge conflict and merged manually