ethereum / eth-hash

The Ethereum hashing function, keccak256, sometimes (erroneously) called sha256 or sha3
MIT License
104 stars 64 forks source link

Just initialize auto backend once #36

Closed Uxio0 closed 2 years ago

Uxio0 commented 2 years ago

What was wrong?

Issue #35

How was it fixed?

Autobackend was initialized every time it was called. Just init it once when object is created.

Previous results

eth_hash 10.747742728999583
web3 14.371859122999012
cryptodome 7.886241019998124
pysha3 0.6233324849999917

When setting ETH_HASH_BACKEND="pysha3":

eth_hash 6.742922992998501
web3 10.079077139998844

Results after PR

eth_hash 4.656213931999446
cryptodome 7.79646546100048
pysha3 0.6171151390008163

When setting ETH_HASH_BACKEND="pysha3"

eth_hash 0.8036113110010774
cryptodome 7.874447432999659
pysha3 0.6161410700005945

Cute Animal Picture

put a cute animal picture link inside the parentheses

Uxio0 commented 2 years ago

I'm aware that some tests need to be adjusted, but I just wanted to know this is the way to go

carver commented 2 years ago

Oh, I see. This is more of a benchmarking issue than a practical performance issue. In typical usage, where you're reusing the same hasher instance, there wouldn't be the performance penalty seen in the benchmark. Like if you ran a hash once in the benchmark setup, I expect that the PR wouldn't cause any meaningful change.

Uxio0 commented 2 years ago

In typical usage, where you're reusing the same hasher instance

The problem is that this is used broadly in web3.py (for example, in the formatter when converting all the addresses to the EIP55 version), and hasher is not reused.

carver commented 2 years ago

Like if you ran a hash once in the benchmark setup, I expect that the PR wouldn't cause any meaningful change.

Oh nope, this guess was wrong!

I ran this test:

from eth_hash.auto import keccak

def monkey_patch(backend):
    old_init = keccak._backend._initialize

    def new_init():
        print("Initializing again...")
        old_init()

    backend._initialize = new_init

monkey_patch(keccak._backend)
print("Run 1:")
print(keccak(b''))
print("Run 2:")
print(keccak(b''))

and was surprised to see!

Run 1:
Initializing again...
Initializing again...
b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';{\xfa\xd8\x04]\x85\xa4p"
Run 2:
Initializing again...
b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';{\xfa\xd8\x04]\x85\xa4p"

So what happens is that this "optimization" of saving the hasher directly to Keccak256.hasher saves the unoptimized version of AutoBackend, which keeps getting called over and over: https://github.com/ethereum/eth-hash/blob/fc3b9dda5f1ce5f908c7c0cbbbc63adb4e030da9/eth_hash/main.py#L25-L27

carver commented 2 years ago

In order to avoid initializing the backend at import time, we could make this change:

diff --git a/eth_hash/main.py b/eth_hash/main.py
index f814449..1503e3b 100644
--- a/eth_hash/main.py
+++ b/eth_hash/main.py
@@ -22,15 +22,18 @@ class Keccak256:
         This is a bit of a hacky way to minimize overhead on hash calls after
         this first one.
         """
+        # Execute directly before saving method, to let any side-effects settle (see AutoBackend)
+        result = self._backend.keccak256(in_data)
         new_hasher = self._backend.keccak256
         assert new_hasher(b'') == b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';\x7b\xfa\xd8\x04]\x85\xa4p"  # noqa: E501
         self.hasher = new_hasher
-        return new_hasher(in_data)
+        return result

     def _preimage_first_run(self, in_data: Union[bytearray, bytes]) -> PreImageAPI:
-        new_preimage = self._backend.preimage
-        self.preimage = new_preimage
-        return new_preimage(in_data)
+        # Execute directly before saving method, to let any side-effects settle (see AutoBackend)
+        result = self._backend.preimage(in_data)
+        self.preimage = self._backend.preimage
+        return result

     def __call__(self, preimage: Union[bytearray, bytes]) -> bytes:
         if not isinstance(preimage, (bytearray, bytes)):

Which causes the earlier mokeypatch test to print out the desired:

Run 1:
Initializing again...
b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';{\xfa\xd8\x04]\x85\xa4p"
Run 2:
b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';{\xfa\xd8\x04]\x85\xa4p"
Uxio0 commented 2 years ago

Hi! You mean making that change besides the changes I proposed?

carver commented 2 years ago

Hi! You mean making that change besides the changes I proposed?

No, I meant to replace your proposal with my suggested changes.

The issue with your proposal is that initialization happens at import time. It would be great to minimize import time.

Uxio0 commented 2 years ago

Hi! You mean making that change besides the changes I proposed?

No, I meant to replace your proposal with my suggested changes.

The issue with your proposal is that initialization happens at import time. It would be great to minimize import time.

Done, it works as expected (time wise) @carver 😄

Uxio0 commented 2 years ago

Sure, done!

carver commented 2 years ago

Argh, I'm having trouble getting Circle working. I reset some settings, and was hoping the new commit would trigger it. But no luck. I guess we'll see on master!

Uxio0 commented 2 years ago

Do you think we could do a release so eth-utils and web3.py can use the optimization?

carver commented 2 years ago

@kclowes would you be able to handle the releases of these three?

kclowes commented 2 years ago

Yep, thanks for the tag. I wasn't watching this repo! Yep, can do!

kclowes commented 2 years ago

@Uxio0 I just released v0.3.3! The dependency ranges are such that you should be able to use it in both eth-utils and either web3.py v5 or web3.py v6 without problems, but let me know if you run into anything. Thanks again for the contribution!