dogecoin / dogecoin

very currency
MIT License
14.4k stars 2.8k forks source link

backport: Adds SipHashUint256Extra and moves CSipHasher to it's own file in crypto/ directory #3496

Open xanimo opened 1 month ago

xanimo commented 1 month ago

~This is a move-only commit with the exception of changes to includes.~

This adds SipHashUint256Extra: Cherry-picked from: 7e0032290669fae5f52c256856c53038511c7db4

And moves CSipHasher to it's own file in crypto/ directory: Cherry-picked from: 4fb789e9b2ffdf48fd50293b3982b3fce4d5fbdf

xanimo commented 1 month ago

I need this run re-triggered for failed runs please. @patricklodder @chromatic @michilumin

patricklodder commented 1 month ago

Question before concept ACK: this was done upstream to not have to include crypto/hash.h but be able to just include the Siphash header on its own. I don't see any includes being removed though. Is this because we are indirectly including crypto/hash.h everywhere?

xanimo commented 1 month ago

Hmm, I'll take a look at that I'm not sure atm. I picked this one because it's where SipHashUint256Extra comes into the code space. https://github.com/dogecoin/dogecoin/pull/3496/commits/626851550ecc334599bdd7a10ed417378af4f8d8#diff-70a1abe85d9d9fa8729884b20556bcf4f8ce94f50a051df85cccd6b9ca873078R134-R173

xanimo commented 1 month ago

Edit: it's actually added here but for the sake of brevity I chose this one.

patricklodder commented 1 month ago

I picked this one because it's where SipHashUint256Extra comes into the code space.

Not related to whether we want to split off this file, but: SipHashUint256Extra is used to hash over a uint256 and additional bytes for utxos. This is not used in any new code in #3229. What do we need that function for right now?

Switching utxo tracking is a consensus code change that should be executed with extreme caution. I don't mind doing it, but then it really needs a plan.

xanimo commented 1 month ago

I believe it's first needed for picking https://github.com/xanimo/dogecoin/commit/b8c03afdb1b063b2fdac549b72fca44065314f58 part of https://github.com/bitcoin/bitcoin/pull/10321/commits unless we want to remove that test?

Yeah wasn't my intention to touch any utxo tracking / consensus code. I still have a fair bit of that work already done though during my research for torv3.

patricklodder commented 1 month ago

We cannot backport scripted diff commits in 99% of the cases anyway, because code has diverged. The only usable thing from these could be the script, but even these don't always work, so those need TLC.

Right now, your commit (and a400df2a77bc0888b57a7bd76bfa10dc0b66355e from the branch you linked) is invalid because you backported the code into the scripted diff:

$ contrib/devtools/commit-script-check.sh 1.15.0-dev..b8c03afdb1
Running script for: a400df2a77bc0888b57a7bd76bfa10dc0b66355e
sed -i 's/insecure_rand/local_rand_ctx/g' src/test/cuckoocache_tests.cpp
diff --git a/src/test/cuckoocache_tests.cpp b/src/test/cuckoocache_tests.cpp
index 6331f7e68..916b23cee 100644
--- a/src/test/cuckoocache_tests.cpp
+++ b/src/test/cuckoocache_tests.cpp
@@ -62,8 +62,7 @@ BOOST_AUTO_TEST_CASE(test_cuckoocache_no_fakes)
 {
     local_rand_ctx = FastRandomContext(true);
     CuckooCache::cache<uint256, uint256Hasher> cc{};
-    size_t megabytes = 4;
-    cc.setup_bytes(megabytes << 20);
+    cc.setup_bytes(32 << 20);
     uint256 v;
     for (int x = 0; x < 100000; ++x) {
         insecure_GetRandHash(v);
Failed
Running script for: b8c03afdb1b063b2fdac549b72fca44065314f58
sed -i "s/\<GetRandHash(/insecure_rand256(/" src/test/*_tests.cpp
sed -i "s/\<GetRand(/insecure_randrange(/" src/test/*_tests.cpp src/test/test_bitcoin.cpp
sed -i 's/\<insecure_rand() % \([0-9]\+\)/insecure_randrange()/g' src/test/*_tests.cpp
diff --git a/src/test/hash_tests.cpp b/src/test/hash_tests.cpp
index 4e88a0ea5..d8de765db 100644
--- a/src/test/hash_tests.cpp
+++ b/src/test/hash_tests.cpp
@@ -128,23 +128,6 @@ BOOST_AUTO_TEST_CASE(siphash)
     tx.nVersion = 1;
     ss << tx;
     BOOST_CHECK_EQUAL(SipHashUint256(1, 2, ss.GetHash()), 0x79751e980c2a0a35ULL);
-
-    // Check consistency between CSipHasher and SipHashUint256[Extra].
-    FastRandomContext ctx;
-    for (int i = 0; i < 16; ++i) {
-        uint64_t k1 = ctx.rand64();
-        uint64_t k2 = ctx.rand64();
-        uint256 x = insecure_rand256();
-        uint32_t n = ctx.rand32();
-        uint8_t nb[4];
-        WriteLE32(nb, n);
-        CSipHasher sip256(k1, k2);
-        sip256.Write(x.begin(), 32);
-        CSipHasher sip288 = sip256;
-        sip288.Write(nb, 4);
-        BOOST_CHECK_EQUAL(SipHashUint256(k1, k2, x), sip256.Finalize());
-        BOOST_CHECK_EQUAL(SipHashUint256Extra(k1, k2, x, n), sip288.Finalize());
-    }
 }

 BOOST_AUTO_TEST_SUITE_END()
Failed
patricklodder commented 1 month ago

Alright, so concept ACK on moving it. But should keep it move-only as the commit says. I'm not sure how urgent the move is without the added code though, I leave that up to you.

xanimo commented 1 month ago

I'm not sure we actually need this at the moment but have consolidated both SipHashUint256Extra along with moving siphash to it's own files in src/crypto/. Moving to draft to give me time to check the crypto/hash.h includes. Also fine with closing.

patricklodder commented 1 month ago

Here's my thoughts on this: IF someone wants to change the granularity of CCoinsViewCache to be at the level of utxo instead of transaction, we'll have to quantify the benefits (maybe/hopefully faster input validation) and costs (disk space, memory) for Dogecoin in 2024 as this may very well be different than it was for Bitcoin in 2017 when this change was made there.

Messagehandler thread load is a thing right now though, because during spam, Dogecoin sees 4-5x the traffic Bitcoin does, and my best nodes haven't been able to serve 125 peers well during these times so I've had to reduce that. If there are gains in validation (cpu) costs to be made with such a change, it would be good to look at it, as long as we can get a clear picture of the consequences.

The addition of SipHashUint256Extra should be part of that solution, but I'd suggest to do that as part of back porting the entire change though, because although that's a big one, it is one of those "all or nothing" things.

In addition to this, a node with NODE_BLOOM enabled (>95% of all connectable nodes my spider sees) will also take a lot more time validating. If we're to both backport the blockfilter (BIP158 and BIP157) and the subsequent change to use clean async indexers outside of immediate consensus, then having a separate siphash per the 2nd commit makes sense (and that's what that particular commit was made for in the first place.)

Although on my own nodes I do not observe many clients that use bloom filters nowadays, if those were to become more commonplace, having that async, deterministic filter set up would help at least not degrading the Messagehandler thread further, as the filters can be constructed async on a separate thread and serve'em-when-we-get'em.

For any node that runs txindex, this can help with performance too.

So bottom line, I'd suggest to leave this in draft for now, and it can be turned into either of the above solutions.