btcsuite / btcd

An alternative full node bitcoin implementation written in Go (golang)
https://github.com/btcsuite/btcd/blob/master/README.md
ISC License
6.1k stars 2.31k forks source link

btcutil: reuse serialized tx during TxHash #2023

Closed kcalvinalvin closed 6 months ago

kcalvinalvin commented 10 months ago

btcutil.Block caches the serialized raw bytes of the block during ibd. This serialized block bytes includes the serialized tx. The current tx hash generation will re-serialized the de-serialized tx to create the raw bytes and it'll only then hash that.

This commit changes the code so that the re-serialization never happens, saving tons of cpu and memory overhead.

coveralls commented 10 months ago

Pull Request Test Coverage Report for Build 7284089562


Changes Missing Coverage Covered Lines Changed/Added Lines %
btcutil/tx.go 24 50 48.0%
<!-- Total: 45 71 63.38% -->
Files with Coverage Reduction New Missed Lines %
connmgr/connmanager.go 1 86.27%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 7278303266: 0.02%
Covered Lines: 28660
Relevant Lines: 50613

💛 - Coveralls
kcalvinalvin commented 7 months ago

What is the speedup?

The speedup is resulted from the fact that the tx is never serialized. Since we receive it serialized in the block, we just hash those bytes, saving a lot of overhead during txhash calculation.

It's a different optimization from #1978 and this speedup is specific to ibd.

I tried to show it via cpu profiling but without the utxocache PR #1955, the speedup isn't observable because the utxo i/o is such a big bottleneck

Roasbeef commented 7 months ago

Code looks good, I did a once over to see if btcutil.Tx had any cases where it was modified after using Hash but couldn't find any.

Yeah I think that's one advtange: for block validation, things come off the wire, then go into the btcutil wrapper, after which they're more or less immutable.

Roasbeef commented 6 months ago

Running a combined branch for a sync test (utxo cache, this PR, and #1978), so far on my machine it's on schedule for a 6-ish hour sync (connected to bitcoind node over LAN, same machine).

One thing that still shows up is this hotspot later into the chain with full validation: Screenshot 2023-12-15 at 3 53 53 PM

Currently of the opinion that my attempt in https://github.com/btcsuite/btcd/pull/1376 may not be the best way to go about it, due to unintended consequences (PSBT interactions signing the long sighash, etc). Spending more time with routines like calcSignatureHash, it's also clear that it could actually lead to incorrect behavior when computing the sighash. When computing the non-segwit sighash, we blank out the sigScripts of inputs not currently being spent. But the cached version of SerializeNoWitness would emit the serialized version we read on the wire, which would include the sigScripts that were blanked our earlier.

Roasbeef commented 6 months ago

Zooming in a bit more, we can see that the channel based binary free list logic is actually what is making things so slow here when writing inputs/outputs: Screenshot 2023-12-15 at 4 07 51 PM

Roasbeef commented 6 months ago

Oops, closed the wrong version...

Also here's the full SVG from above: profile001

kcalvinalvin commented 6 months ago

Rebased and addressed the review comments.

Roasbeef commented 6 months ago

Did the final nits in https://github.com/btcsuite/btcd/pull/2081