Open Lekensteyn opened 10 years ago
Which tools are you referring to?
sha256sum
, Python's hashlib.hexdigest()
, openssl sha256
.
Base64-encoded digests are slightly unusual.
Are you using those tools with peep? Is using base64 representation making something you want to do difficult?
The hex digests are posted on PyPi, GPG-signed release announcements (in the case of Django), so yes, it introduces an additional step of complexity.
I was thinking of this workflow:
If I have to use base64-encoded strings, then I need to convert the hex digests to bytes and then convert it to base64 (substituting the two non-alphanumeric characters which I need to figure out first (and append padding?)).
So it is easier if I could use the existing sources rather than converting it to an unusual format.
Got it. That's a much more complete report of the issue than the original description.
More complexity yields more bugs, so I don't think we should support two hash formats. I think the way forward with this would be either:
I'm inclined to go with number 2 though I suspect the use case of using base16 encodings to make it easier to use other peoples' hashes leads to a false sense of security which makes me wonder whether we should decide it's not worth doing.
@erikrose Your thoughts?
The biggest threat I am worried about is that the file gets changed in the future (or now, via a MitM), not necessarily fake checksums.
The sources for checksums would be:
+1 to allowing base16 checksums. I don't think I've ever seen base64 hashes anywhere else, which makes it hard to use other tools. I tried to use sha256sum to verify hashes for a while, and it really wasn't clear to me until later why they weren't matching up.
base64 hashes are slightly shorter (43 vs 64 characters), but I don't think that matters much.
Personally I think that peep should be able to handle more types of checksum than just base64 encoded sha256 checksums. I would be ok with auto-detecting base64 or base16.
Where are the hex sha256 digests on PyPI? All I see is sha1 hashes, and I have to dig into the DOAP records to see them. If you're talking about md5 hashes, those and sha1 are fairly thoroughly broken. While I could see them being useful and easier to guard solely against accidents, I am reticent to make them the path of least resistance.
Supporting the hex versions of sha256 hashes seems like a no-brainer to me; we can easily distinguish them based on length.
Depending on your threat model, md5 can be sufficient "for now". Collision vulnerabilities affect the trust you can have in the integrity of files with a checksum from untrusted sources. If you already have a md5 hash which is not specially crafted, then a second preimage attack is more difficult.
SHA-1 does not have known preimage vulnerabilities at all.
So, when do you expect to implement the hex versions?
I can't believe I haven't mentioned the peep hash
command. If you have the tarballs or wheels downloaded, that's what you run to get the wheel-formatted hashes of them; you don't need to muck around in the REPL yourself or anything like that. FWIW, the base64'd hashes come from http://legacy.python.org/dev/peps/pep-0427/#signed-wheel-files, the format wheels use internally. Those didn't become popular [yet], so the lack of tooling around them is annoying.
Hex versions honestly aren't a big priority for me, since they don't scratch any of my itches, but I'll gladly accept a patch.
Given that the many tools output base-16 representations of the hashes, what about adding support for this? The length of the hash could be used to detect the format (base64 vs hex):