erikrose / peep

A "pip install" that is cryptographically guaranteed repeatable
MIT License
221 stars 28 forks source link

Support for "standard" hexadecimal hashes #49

Open Lekensteyn opened 10 years ago

Lekensteyn commented 10 years ago

Given that the many tools output base-16 representations of the hashes, what about adding support for this? The length of the hash could be used to detect the format (base64 vs hex):

# sha256: 1YiBrxeoKyxVnuPvaVEF96AfayYZap-zXLHxUPxPjzw
# sha256: 3f7a8ec45765cc33daac2448c609ab08e76ffb5a
peep==1.3
willkg commented 10 years ago

Which tools are you referring to?

Lekensteyn commented 10 years ago

sha256sum, Python's hashlib.hexdigest(), openssl sha256.

Base64-encoded digests are slightly unusual.

willkg commented 10 years ago

Are you using those tools with peep? Is using base64 representation making something you want to do difficult?

Lekensteyn commented 10 years ago

The hex digests are posted on PyPi, GPG-signed release announcements (in the case of Django), so yes, it introduces an additional step of complexity.

I was thinking of this workflow:

  1. Pick the published sums from a known-good source.
  2. Paste them into the requirements file
  3. Install everything using peep.

If I have to use base64-encoded strings, then I need to convert the hex digests to bytes and then convert it to base64 (substituting the two non-alphanumeric characters which I need to figure out first (and append padding?)).

So it is easier if I could use the existing sources rather than converting it to an unusual format.

willkg commented 10 years ago

Got it. That's a much more complete report of the issue than the original description.

More complexity yields more bugs, so I don't think we should support two hash formats. I think the way forward with this would be either:

  1. decide it's not worth doing
  2. switch from base64 to base16

I'm inclined to go with number 2 though I suspect the use case of using base16 encodings to make it easier to use other peoples' hashes leads to a false sense of security which makes me wonder whether we should decide it's not worth doing.

@erikrose Your thoughts?

Lekensteyn commented 10 years ago

The biggest threat I am worried about is that the file gets changed in the future (or now, via a MitM), not necessarily fake checksums.

The sources for checksums would be:

mythmon commented 9 years ago

+1 to allowing base16 checksums. I don't think I've ever seen base64 hashes anywhere else, which makes it hard to use other tools. I tried to use sha256sum to verify hashes for a while, and it really wasn't clear to me until later why they weren't matching up.

base64 hashes are slightly shorter (43 vs 64 characters), but I don't think that matters much.

Personally I think that peep should be able to handle more types of checksum than just base64 encoded sha256 checksums. I would be ok with auto-detecting base64 or base16.

erikrose commented 9 years ago

Where are the hex sha256 digests on PyPI? All I see is sha1 hashes, and I have to dig into the DOAP records to see them. If you're talking about md5 hashes, those and sha1 are fairly thoroughly broken. While I could see them being useful and easier to guard solely against accidents, I am reticent to make them the path of least resistance.

erikrose commented 9 years ago

Supporting the hex versions of sha256 hashes seems like a no-brainer to me; we can easily distinguish them based on length.

Lekensteyn commented 9 years ago

Depending on your threat model, md5 can be sufficient "for now". Collision vulnerabilities affect the trust you can have in the integrity of files with a checksum from untrusted sources. If you already have a md5 hash which is not specially crafted, then a second preimage attack is more difficult.

SHA-1 does not have known preimage vulnerabilities at all.

So, when do you expect to implement the hex versions?

erikrose commented 9 years ago

I can't believe I haven't mentioned the peep hash command. If you have the tarballs or wheels downloaded, that's what you run to get the wheel-formatted hashes of them; you don't need to muck around in the REPL yourself or anything like that. FWIW, the base64'd hashes come from http://legacy.python.org/dev/peps/pep-0427/#signed-wheel-files, the format wheels use internally. Those didn't become popular [yet], so the lack of tooling around them is annoying.

Hex versions honestly aren't a big priority for me, since they don't scratch any of my itches, but I'll gladly accept a patch.