Closed JustinDrake closed 5 years ago
As a side note I'd put that "ewasm on eth1.x" proposes to introduce blake2 support on Eth1, which may happen in time before the release of Eth2. This might be a point to consider.
I don't have a strong preference but if I had to vote it would be against.
Source: http://valerieaurora.org/hash.html
- Speed: SHA256 is ~50% faster than Keccak256.
From https://github.com/ethereum/eth2.0-specs/issues/218:
The performance benefits of Blake cannot be relied upon because STARK/SNARK-friendly hashes will likely be no faster than SHA3.
Is that assumption was changed anyhow?
Is that assumption was changed anyhow?
Speed of hash functions can be evaluated in different contexts. In the "plain-text model" (i.e. naive execution) SHA256 is faster than Keccak256. In the context of MPCs/SNARKs/STARKs all binary hash functions (SHA2, SHA3, Blake) are pretty terrible.
When looking into those timelines, I feel like SHA2-256 will have critical issues
I'd say that SHA2 only needs to survive another 5-10 years. The reason is that we intend to migrate to a STARK-friendly hash function when we make the cryptographic primitives quantum-secure with STARKs. Are there cryptoanalysts who believe SHA2-256 will be broken within 10 years?
I'd say most cryptographers think the SHA2 breaking is less likely than significant improvements in classical solutions to the discrete log problem or composite prime factoring.
Here was our reasoning for choosing SHA256 in the Cosmos Merkle Tree. https://github.com/tendermint/iavl/issues/38
I'd say the right mix of hash functions in any blockchain protocol is SHA256 and a generic function SPONGE function from KECCACK family for Merlin. https://docs.rs/merlin/1.0.2/merlin/
a generic function SPONGE function from KECCACK family for Merlin
Why not use SHA256 for Merlin?
So Merkle Damgard style hash functions accumulate input and then produce 1 output.
Sponge constructions allow you to put in some input and take out some output and then put in some more input and then get some more output etc.
What's interesting about the KECCAK family is not when you are using them in the same way as Merkle Damgard hashes but when you are using the unique properties of it's SPONGE construction.
This is mentioned in the IAVL thread but SHA256 is getting support in future hardware Intel processors which makes it as fast as BLAKE2.
Interesting suggestion from my colleague, Nicolas Gailly: https://multiformats.io/multihash/
This wouldn't be supportable natively in Eth1: the deposit contract would have to prepend the metadata to the hashes it outputs for the Merkle path, but that's relatively easy. This would give us good agility around hash functions for the foreseeable future.
It doesn't necessarily give us interop with other chains out-of-the-box, but could make that realistic with a simple shim layer to insert the appropriate hash metadata. Then we could interoperate with any chain using any of the hashes we choose to implement in the client.
Just thought I'd weigh in with some thoughts. It seems that this problem can be viewed via one of two lenses:
Compatibility: The advantages to using SHA256
for the sake of compatibility are clear, as are are the implementation/speed advantages over Keccak (e.g. Intel's instruction sets). Backwards compatibility is the reason blake2b
has been decided against for Eth2.0, however there are some indicators that suggest Eth1 will eventually be able to compute blake2b
efficiently, at which point it would make a lot of sense for Eth2.0 to use blake2b
.
While maintaining backwards compatibility is clearly essential, I believe one of the Eth2.0 project goals is to pave the way for a better Ethereum overall; which I think means that it should exert a positive influence on Eth1. If we think that blake2b
is a better function, I would hope that Eth1 can adapt in due course.
From my understanding, the main reason that using a hashing algorithm which is inefficient in Eth1, such as blake2b
, is that it inhibits the ability to move data/ether from Eth2.0 back into Eth1; i.e. it prevents Eth1 from reading/verifying the Eth2.0 state. I imagine that regardless of the hash function, Eth1 will require an update before it can perform verification in any case, and so introducing a more efficient hash function in the same update will be comparatively easier. Also, this would only need to happen once Eth2.0 is well established, and Eth1 decides to support it.
Future-Proofing: Since Bitcoin's PoW mechanism uses SHA256
, the global potential/expected hashes-per-second rate for SHA256
is far higher than any other hash function. There exist clear incentives for developing faster hashing, and more sophisticated attacks for, SHA256
; demonstrated so far by Bitcoin's historical hashrate. I think that this increases the likelihood of hash collisions in the long-term. As such, I don't think it's a good long-term strategy to depend solely on SHA256
. Keccak256
however, does not currently suffer from this issue, and so I would consider it slightly more future-proof than SHA256
, but much less than blake2b
.
I also don't believe the length-extension attack is a disadvantage of SHA256
in our use case, reasons for which have been explained by @zmanian.
Personally, I feel the goal of choosing the correct technology with good future-proofing is more important than maintaining compatibility. In this case, I think blake2b
is the best choice, and I can understand why Polkadot made this choice. However, if we have collectively decided to not use blake2b
, then I don't think the future-proofness of Keccak256
outweighs the compatibility advantages of SHA256
.
I think @benjaminion's suggestion of using multihash is a fantastic one. The implementation overhead for supporting multiple hashes in this format is negligible and it means we get the ultimate flexibility in choosing hash functions on-the-fly. I imagine that this would mean particular blocks or shards could select a hash function according to their goals. Maybe, in the beginning, only certain blocks need to be verified by Eth1, and those blocks can simply choose SHA256
, while others can choose blake2b
, thereby having selective interoperability and allowing us to adjust the slider between compatibility/future-proofing as we go. It also means that if vulnerabilities are discovered in any hashing algorithm, deprecating a function would be considerably easier. Further, multihash is maintained by Protocol Labs, so I assume it would have good support in libp2p.
Summary: I prefer:
blake2b
SHA256
Keccak256
But I don't think we should choose now and we should instead support multihash and allow any secure hashing algorithm.
I asked Dan Boneh
Do you think SHA256 will plausibly remain secure until 2030? What about 2040? SHA256 seems to be the "blockchain standard" we are inclined to favour, but concerns around its security have been raised. In particular, we are aware of the length extension attack on SHA256, and this website suggests "minor weaknesses".
and he responded
I guess you are asking about the collision resistance of SHA-256. There is nothing known about the full SHA-256 that is better than the birthday bound. Assuming no algorithmic improvements, and assuming Moore's law continues (a big assumption) then one could expect a collision to be found in about 75 years, which seems fine for your applications. Quantum attacks also do not affect collision resistance.
However, the fact that NIST put out SHA-3 may suggest that there are non-public attacks against SHA-256 that are better than the birthday attack. This is just speculation. We have no information about this.
Hash function compatibility between Eth1 and Eth2 is important for several reasons:
In December 2018 we ditched Blake2b on Eth2 because of incompatibility with Eth1. In doing so we fell back to Eth1's native Keccak256. It turns out Eth1 has a SHA256 precompile. This opens up the possibility to use SHA256 on Eth2. Below is a breakdown of the pros and cons of SHA256 vs Keccak256.
Pros
Cons
The goal of this issue is to provide a heads-up and encourage discussion. My personal gut feel is that interoperability alone outweighs the cons.