Closed stain closed 6 years ago
Hmmm ... what about "sha-256-128"? Does this become "sha256128"?
I'm a little concerned about tying to IANA registry. Any sense on how viable it is?
I'm afraid it would become sha256128
by the current normalization rule yes..
We should be more concerned about sha3
I think, as I don't see much point in making tagfiles with shortened checksums. Here the size is always explicit (e.g. sha3-256
) so you can't just say sha3
.
An alternative as we're aiming for an RFC is to make our own IANA registry, that's effectively what we've done now (currently with 4 entries for md5/sha1/sha256/sha512).
But look for "hash" under https://www.ietf.org/assignments/ - why do we need yet another one? RFC6920 registry is quite straight-forward to augment.
However I see great benefits with linking to RFC6920 for bagit - for one you have instantly global identifiers for (the bytes of) every item in a bag, secondly it also has a resolution mechanism that a bagit client could use to fix bags, say if fetch.txt URIs give 404.
I'm split on this one. On the one hand, registries are generally a good thing. On the other hand, the existing approach has worked fine to date.
Once we go RFC we can't do sporadic updates for adding checksum algorithms, just erratas or new RFCs, so then it would make sense to do it as a new registry so the community can do easy additions to it.
If "algorithm name" is free-form anyway, then there is not much point in normalizing it.
I can have a bag with manifest-Grøstl-512.txt
and a human is anyway needed in the loop to say it is compatible with the code that can handle manifest-grostl384.txt
in another bag.
However if we say it SHOULD be in the registry, then that should hopefully prompt some emails or at least emerging community consensus about what to use for a new algorithm name, and then the registration can happen, either in the RFC6920 registry (as I suggest) or in our own.
If we do our own registry it could have a similar simple registration procedure as RFC6920, basically "Expert Review" to check it's not rot13, patent-troll or a duplicate. Might be worth checking with the IETF folks at arts@ to see what they think is preferable.. it must have come up for the other hash registries?
Thanks, @acdha, fixed all of those.
Do we need @acdha to formally re-review before merge..? GitHub insist on "requested changes" even though I believe they are now addressed.
I believe you have addressed @acdha comments. Since @justinlittman is in disagreement we should probably have kunze weigh in
I'm OK with being the dissenting opinion, but agree that @jkunze should review.
@stain do you think you could resolve the conflicts on this PR? Once that is done I think we are good to go with merging it. Thanks!
fixed merge conflict and merged manually
Reference Named Information Hash Algorithm Registry for future algorithm names.
I added explicitly MD5/SHA1 legacy algorithms as permitted, as unlike
sha-256
andsha-512
these are not in the registry.Left as an exercise to the reader is how to normalise
sha3-512
.