BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
4.93k stars 333 forks source link

Hash struct lacks an Ord implementation #202

Open sofia-snow opened 2 years ago

sofia-snow commented 2 years ago

Without an Ord implementation (or even a Hash implementation) blake3::Hash values cannot be used as keys in BTreeMaps or HashMaps, among many other valid uses for (vartime) orderings.

To be safe; these implementations ought to be constant-time. Either way, the subtle crate provides traits which are explicitly intended for constant time equality and ordering.

As a workaround, I'm using the newtype pattern to implement these traits on my own OrdHash type.

If a PR would be welcome for this addition, should the implementation be constant time or variable?

oconnor663 commented 2 years ago

Interesting question. My instinct is that for most applications where constant-time ordering isn't an issue, the easiest thing is to convert to [u8; 32] with .into(). Given that option, I think if we were going to add an Ord/PartialOrd impl in this crate, it would make sense for it to be constant time. That way we maintain the general rule that "you can convert to [u8; 32] when you don't care about constant-time stuff."

oconnor663 commented 2 years ago

For what it's worth, I've often thought about splitting out the Hash type into its own crate. Especially now that const generics are stabilized, it might make sense to revisit that. I wonder if the RustCrypto folks would be interested in sharing a single type here.

sofia-snow commented 2 years ago

My understanding is not all of const-generics have landed, and RustCrypto need more than currently offered even by nightly.

https://github.com/RustCrypto/traits/issues/238#issuecomment-806931108

ppamorim commented 2 years ago

@oconnor663 Hi, is there any work done on that? I am trying to use Hash on my BtreeMap but I need to convert it to hex every time. :(

oconnor663 commented 2 years ago

I haven't done any work on this myself, no. It looks like the right way to do it will be to replace our dependency on constant_time_eq with a dependency on subtle, which can do both Eq and Ord. If anyone's interested, this is probably a good beginner PR.

In the meantime, you might want to convert Hash value to bytes rather than hex, either with .as_bytes() or with .into(). That'll be the same size as the original Hash, which might be important if you're storing lots of them.

Eitu33 commented 2 years ago

Shouldn't https://github.com/BLAKE3-team/BLAKE3/pull/203 be merged before eventually finding a better alternative?

Also why is a constant-time Ord implementation required?

After thinking a bit about how it could be done when comparing 2 Hash I realised that having a constant-time implementation for this would require to always go through the whole byte array. This would be the equivalent of the worst case scenario of the default implementation which stops the comparison whenever the bytes aren't equal, so I don't see why that would make sense.

imuli commented 1 year ago

Same reason Hash has a constant time Eq implementation. Though I can't think of an attack off-hand, I would bet there's some use case out there that leaks information though the time it takes to compare (probably many) hashes - especially if you could control what hashes it's being compared to...

Anyway, I'm glad there's already an issue open here with something of a plan, I'll take a look at subtle.

elichai commented 1 year ago

Basically whenever Blake3 is used as an HMAC you really really want CT comparison https://threatpost.com/timing-attack-google-keyczar-library-060209/72738/

I'd also say that I doubt this could be a bottleneck anywhere, it's 4 words xoring them instead of branching over them might even be faster

oconnor663 commented 1 year ago

One issue to keep in mind here is that, even if Hash provided a constant-time comparison, you could still leak information by using it in a data structure that doesn't make higher-level constant-time guarantees. For example, suppose we have a search tree of MACs, and our application takes a MAC from the user and asks "is this in the tree?" Maybe the tree looks like this:

    mac2
   /    \
mac1    mac4
       /    \
     mac3  mac5

Even if comparing individual MACs is constant time, the tree structure itself still has a timing leak. If I query it with a MAC that's less than mac2, it's going to do 2 comparisons. But if I query it with a MAC that's greater than mac2, it's going to do 3. If I can measure that timing difference, I can work from the high bits down to the low bits and recover mac2 in ~256 queries! (In general, an oracle that answers "is your MAC equal to the valid one?" is normal and no problem, but an oracle that answers "is your MAC greater than the valid one?" is broken and quickly leaks the valid MAC.)

We could probably come up with some way to fix the tree. We could add a requirement that all the leaves are at the same depth, and we could design some sort of "empty node" concept to make that work. I'm sure someone somewhere has done this before, but it's a very exotic problem to be solving, and no ordinary tree-map implementation is going to do anything like this.

So one worry I have about a constant-time Ord implementation, is that it might inspire false confidence that putting MACs inside a BTreeSet is safe. But that's not safe, and there's nothing Hash can do to make it safe. I'm curious what other use cases folks here have in mind.

imuli commented 1 year ago

So one worry I have about a constant-time Ord implementation, is that it might inspire false confidence that putting MACs inside a BTreeSet is safe. But that's not safe, and there's nothing Hash can do to make it safe.

I agree with that concern. Even in a non-MAC scenario it could be a problem - if you can fetch data based on it's hash, the time until cache miss could reveal blocks that are present in the cache.

I'm curious what other use cases folks here have in mind.

I don't think I have an actual use case for a constant-time Ord implementation so much as any Ord implementation. Mainly for things like sorting keys during serialization to remove hash map leaks.

imuli commented 1 year ago

This https://github.com/BLAKE3-team/BLAKE3/pull/267#issuecomment-1321273326 seems relevant over here.

Actually I should've said that it's not clear (to me) that we want any Ord implementation at all. In my mind, forcing the caller to convert hashes to bytes to sort them has some value, because it makes it clear that the comparison semantics are the regular [u8] comparison semantics. Given that we do encourage users to rely on our constant-time Eq impl, I worry it would be confusing to expose an Ord impl without making a similar guarantee.

So the recommendation for types that want an Ord implementation and to contain a Hash would be to either make a wrapper, write it manually, or embed it as a [u8; 32] - which I guess is what everyone is doing now anyway :)

oconnor663 commented 1 year ago

Yeah if you need to sort hashes or something like that, my instinct is to convert them to [u8; 32]. In --release mode that conversion is almost certainly a no-op, and it makes it maximally clear what's going on.

joshtriplett commented 1 month ago

@oconnor663 One mild annoyance with using [u8; 32], though, is that it doesn't provide convenient operations to format it as hex, the way blake3::Hash does.

imuli commented 1 month ago

Yeah, in practice I make my own wrapper around [u8;32] (with all the implementations I want) and convert back and forth as needed.

In particular I'm finding this issue not a big deal because I also am adding implementations for serialization and other nonsense that doesn't belong in the blake3 crate, and there's no OrphanInstances in rust so I'd need to make it anyway.