There should be a standard way to recusively hash structs

pedrocr commented 7 years ago

std::hash::Hasher can be derived for structs and is a standard hashing interface in rust. The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes. So for ergonomic purposes wouldn't it make sense to have an adapter to allow using the Hasher API?

tarcieri commented 7 years ago

But if it's possible to generate a collision by just manipulating structure Hash and CryptoHash are equally broken.

Incorrect. Again, hashDoS is an algorithmic attack that relies on the attacker being able to find large numbers of collisions. Finding a single collision is not catastrophic, because it doesn't help the attacker that much.

Furthermore, the interesting cases for hashDoS are HashMap<T, _> for a single type T. For typical Hash cases we don't need to worry about collisions across types.

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

These are totally different threat models you are trying to collude. Hash is not designed to solve the problem you're proposing.

My proposal is very simple. Do exactly what Hash does but feed it to a crypto hash. I've yet to see an attack that breaks that but it may exist.

The onus is on you to show why your scheme is secure. Otherwise you're shifting the burden of proof. Your (ill-defined) scheme is not secure simply because I haven't found an attack. You need to be able to answer questions like: how is the scheme domain separated everywhere type-by-type? How does it distinguish a Vec<u8> containing the data that would be hashed for some struct from the struct itself?

A secure scheme will be demonstrably unambiguous and free of collisions for arbitrarily structured messages. Anything less is insecure.

This is a much higher bar than Hash.

pedrocr commented 7 years ago

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

If you care about collisions across types then yes, it's easy to generate collisions that are benign in Hash but break CryptoHash. Again, not my use case, but I can see how you'd want that in general and if there's a simple way to do that then great. objecthash doesn't currently do that completely though as it hashes the same structure of fields to the same value independently of type. So objecthash(Person{id:0}) == objecthash(Dog{id:0}). Generating extra colisions with Hash isn't trivial actually but can be done with things like hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8)). That's just a trivial bug (i.e., since Hash doesn't care about different types it doesn't do the same as with Vec and also hash the length). But if there's a standard way to fix that all the better. I've had a second look at objecthash and opened a bug report on it.

pedrocr commented 7 years ago

After looking at objecthash what it does is create a hash that's interoperable between languages and thus has some situations that generate collisions on purpose:

First{id: 0} and Second{id: 0} will hash to the same value even if one uses u32 and the other u64.
Different unicode strings that normalize to the same value have the same hash
Structs that have the same fields but in different order will hash to the same value

This makes it easier to have values for which objecthash(a) == objecthash(b) and yet myfunc(a) != myfunc(b). For some applications this is not ideal. Given this I'd say it would make sense to have a simpler scheme that doesn't have the issues of Hash (e.g., tuples hash the length) but just does the hashing of all fields in order with all the contents.

Trojan295 commented 7 years ago

hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8))

I would argue, if it's a bug, cause the byte representation of the data is the same. Hash functions work on bytes, not data structures. Let's say you have a vector in Rust and in Java, which are holding exactly the same data. A hash of those should be the same in both languages, even that the underlying implementation of vectors could be different. If we couple a hash function with the implementation of the data structure in Rust, we could make, that this hash would be usable only in Rust. What's even worse, changes to the compiler could affect the resulting hashes.

pedrocr commented 7 years ago

@Trojan295 I'm not sure if you're arguing that it's a bug or not. (0u16, 0u16) and (0u8, 0u8, 0u8, 0u8) are not holding the same data, one has 2 zeroes the other 4. The fact that end generating 4 zero bytes in memory in both cases is an implementation detail. And rust already does hash([0u16, 0u16]) != hash([0u8, 0u8, 0u8, 0u8]) because the Vec length is already hashes. Only the tuple length isn't.

Trojan295 commented 7 years ago

Are you trying to use those structured hashing functions only for internal Rust purposes like HashMap or do you want to make them usable in cryptographic manner? For such internal use that ok, but hashes used in cryptography are mostly send to other entities. Now if you would define a custom data structure in Rust and apply some hash on it, then the other entity would need to know how did you calculate the hash of this structure.

Let's take this Vec. Don't know how it's done in Rust, but let's assume that the length is appended to the data and hashed. Then the guy on the other side of the wire needs to know, how you build the byte stream, that was hashed (so that you appended the length, and not for ex. prepended). It's not simple to unify this across multiple parties.

Generally, I don't think such feature is required in case cryptographic hashes, as in case of crypto you mostly operate on numbers/bytes and not custom data structures. The way of hashing needs to be well known and unified.

tarcieri commented 7 years ago

@Trojan295 short of Hash changing to a const-dependent pi-typed interface, this wouldn't work for things like HashMap, which needs a different type signature.

But as I've previously stated, and if it's the thing I have a bug up my butt about, it's conflating security domains and concerns, and that's really what I want to avoid.

I am a huge fan of something like a CryptoHash scheme and have already implemented one in Rust and plan on designing and implementing another. I just want to make sure it covers all of the concerns I have. Those concerns are orthogonal to what Hash presently provides.

newpavlov commented 7 years ago

I will close this issue and will create a separate issue in the RustCrypto/utils repository. Crate for cryptographically hashing structs would be a great addition to the project, but as I stated earlier it's better to do as a separate crate. I think concerns raised by @tarcieri are good ones and should be addressed in this crate.

burdges commented 7 years ago

I think a utility crate could provide a wrapper type struct HashWriter<'a, I: 'a + ?Sized + Input>(&'a mut I) that impls io:Write along with input_le, input_be, etc. We cannot make it totally generic over serde serializers anyways because most lack bincode::serialize_into, but folks can work around that on a case by case basis.

newpavlov commented 7 years ago

@burdges Could you please specify for that use cases io::Write impl is needed and current digest_reader is not enough?

RustCrypto / hashes

There should be a standard way to recusively hash structs #29