filecoin-project / specs-actors

DEPRECATED Specification of builtin actors, in the form of executable code.
Other
86 stars 102 forks source link

Murmur3 based HAMTs are not secure with attacker influenced keys #517

Closed Stebalien closed 4 years ago

Stebalien commented 4 years ago

If an attacker can influence the keys used in a HAMT, the HAMT must use a cryptographically secure hash function. As-is, if an attacker can brute-force several (3) 64bit murmur3 hashes, they can prevent a key from being inserted into the HAMT.

Even if the key is a sha256 hash, brute-forcing 2**64 SHA256 hashes (enough to pick a specific murmur3 key) is well within the power of a bitcoin miner.

TL;DR: The hash needs to have at least 128 bits of security (but doesn't need to be safe against the birthday attack).

ZenGround0 commented 4 years ago

if an attacker can brute-force several (3) 64bit murmur3 hashes, they can prevent a key from being inserted into the HAMT.

Could you explain the mechanism behind this or link to somewhere it has already been discussed?

Stebalien commented 4 years ago

If I know you're going to insert key Kt with hash Ht = murmur3(Kt).

  1. I can create keys Kx, Ky, and Kz such that the murmur3(K{x,y,z}) == Kt.
  2. Insert these into the HAMT.
  3. You'll then try to insert Kt.
  4. Try to insert a 4th key in this shard, hitting https://github.com/ipfs/go-hamt-ipld/blob/0310ad2b0b1f880f8412150aa7dd9b1e6b9fb79d/hamt.go#L337-L358 because the shard is full.
  5. Repeatedly deepen the hamt until you run out of bits in the hash.
  6. Return an error.
Stebalien commented 4 years ago

https://github.com/ipfs/go-hamt-ipld/pull/49

rvagg commented 4 years ago

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification? I'm working on an appendix to the IPLD HashMap spec which can be linked from go-hamt-ipld and the Filecoin specs that outlines the precise ways that the Filecoin HAMT varies and would like to be able to say what Filecoin uses. I imagine this might also need a SHA2-256 implementation alongside the current default @ https://github.com/ipfs/go-hamt-ipld/blob/master/hash.go ?

austinabell commented 4 years ago

TL;DR: The hash needs to have at least 128 bits of security (but doesn't need to be safe against the birthday attack).

Is there a reason against using the 128 bit murmur3? 64 bit version is just a chopped 128 and there doesn't seem to be a clear reason to me why 64 bit hashes are being used as they are not being persisted in any way (in rust impl we are using 128 bit murmur3 because it wasn't specified to specifically use 64 bit but this should be resolved soon so we don't hit this edge case).

Edit: Seems to me, if I'm not misremembering things, that you would have to brute force all these hashes: 3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0. But yeah this doesn't seem much better. (with bit width 5 you don't even have to match the last 4 bits for the brute forced 3 keys)

Stebalien commented 4 years ago

Murmur3 is not a cryptographic hash function. Technically murmur3-128 should be fine as long as the input keys are already cryptographic hashes of something, but that still makes me nervous (and I'm not even sure that that's the case here).

3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0.

Not quite. Buckets only apply to leaves (i.e., three values to a bucket). Once you fill a bucket, you break it out into a new shard.

So you'd need to crack 3 hashes, not 3*64.

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification?

@rvagg Which part? The fact that filecoin will use sha256?

austinabell commented 4 years ago

Murmur3 is not a cryptographic hash function. Technically murmur3-128 should be fine as long as the input keys are already cryptographic hashes of something, but that still makes me nervous (and I'm not even sure that that's the case here).

I agree, just mentioning if you don't want actual breaking changes

3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0.

Not quite. Buckets only apply to leaves (i.e., three values to a bucket). Once you fill a bucket, you break it out into a new shard.

Yes, I'm aware of this, I may have explained poorly. the 3 hashes you need are the ones in the leaf, and all of the other hashes to brute force (55, 50, 45, .., 5) are to push the height to the max. Because each item inserted to push the height needs to have 5*height bits matching to ensure the collision.

So you'd need to crack 3 hashes, not 3*64.

Sorry, bad wording, I meant just brute forcing 3 hashes with the first 60 bits matching, didn't mean to say you need 3*60 hashes

In any case, this is semantics, I agree with the outcome and let me know where I can find the final decision so we can make the change on our end

Stebalien commented 4 years ago

Sorry, bad wording, I meant just brute forcing 3 hashes with the first 60 bits matching, didn't mean to say you need 3*60 hashes

Ah, yes, you're right. My point was that you don't need to brute-force at each level. If you brute-force 3 hashes at one level, then try to insert a 4th item with the same hash, that will immediately try to recursively expand the HAMT till it runs out of hash bits.

In any case, this is semantics, I agree with the outcome and let me know where I can find the final decision so we can make the change on our end

:+1:

I believe the final decision is "use sha256" but I'm not sure what remains to make that happen.

austinabell commented 4 years ago

Ah, yes, you're right. My point was that you don't need to brute-force at each level. If you brute-force 3 hashes at one level, then try to insert a 4th item with the same hash, that will immediately try to recursively expand the HAMT till it runs out of hash bits.

No? Because if the hash is the same, the value will be overwritten. The hash has to be distinct, but match the previous bytes. I verified on the go impl and my impl but maybe I'm just overlooking something here.

Stebalien commented 4 years ago

Because if the hash is the same, the value will be overwritten

That shouldn't be the case. We should get to:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L355

Then call:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L294

Then fail with ErrMaxDepth.

austinabell commented 4 years ago

Because if the hash is the same, the value will be overwritten

That shouldn't be the case. We should get to:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L355

Then call:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L294

Then fail with ErrMaxDepth.

Ahhh you're definitely right, my fault. So basically you would need 4 hashes where the first 60 bits are the same and the last 4 different.

I would still argue even without this case being hit that 64 bit hashes would still cause issues because a collision would just overwrite the data and that seems equally as bad as not allowing certain values in.

Stebalien commented 4 years ago

I agree, both cases are equally bad.

rvagg commented 4 years ago

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification?

@rvagg Which part? The fact that filecoin will use sha256?

Yeah, I've pinned down the specifics of the Filecoin HAMT in https://github.com/ipld/specs/pull/282 that we're using for spec documentation but that will need to reflect a change to SHA2-256 and that'll need to get communicated to other implementers too. I'm mostly interested in knowing how likely this change is.

There's also https://github.com/ipfs/go-hamt-ipld/issues/53 which is a potential breaking change that should get done (and documented) sooner than later or it'll become a big problem later.

I hope this isn't derailing this issue too much, but as a side note related to all of this: as I've pointed out before, chain height is the main mechanism available at the moment for versioning a lot of these things, but it's not a very nice way of versioning when you're trying to inspect independent portions of the chain, so IMO these things shouldn't be considered all that flexible over time, the fewer changes the better. Unless you want to add more explicit versioning fields to the data structures that consume the HAMT and other complex things that are likely to get tweaked over time. Another example of potential tweakage is the bitWidth of the HAMT which is currently set to 5 in Filecoin, which gives a branching factor of 32. We never did research into the costs of that decision and it's mostly a guess that it's a good tradeoff between storage size and mutation cost. I'm guessing that there may be a point when we can back up and do some research and want to change that number to either fatten or thin the tree to account for how it's being used with this specific data set.

anorth commented 4 years ago

The setting of bitwidth=5 is not a guess, but an empirically determined number from experiments performed by @hannahhoward and @acruikshank in mid-2019. However, the dominant workload for that HAMT was different to the workload now, so it's quite possible that the optimum is an adjacent value now. I do consider it pretty much locked-in for good for the HAMTs that will exist in code at genesis, though.

We should repeat the benchmarking.