chainhash, wire, btcutil, main: Memory efficient txhash

kcalvinalvin commented 1 year ago

When profiling the ibd, I noticed that TxHash() is allocating quite a bit of memory.

I also noticed this with utreexod in the past and I resolved this by creating a new double hash function. I've benchmarked it and made is more efficient in this PR.

Benchstat for BenchmarkTxHash() showed a slight bit of increase in sec/op which is likely due to the overhead of serializing into a hash.Hash instead of the bytes.Buffer in the previous TxHash(). However, for real life cases, the new TxHash() will be faster as we're saving on the unnecessary serialization into bytes.Buffer as sha256.Sum256()will call Write() into the hash anyways.

The memory allocation savings is ~37% compared to the old TxHash(). This matters a lot because TxHash() gets called a lot.

goos: linux
goarch: amd64
pkg: github.com/btcsuite/btcd/wire
cpu: AMD Ryzen 5 3600 6-Core Processor              
          │   old.txt   │              new.txt               │
          │   sec/op    │   sec/op     vs base               │
TxHash-10   1.347µ ± 1%   1.411µ ± 2%  +4.79% (p=0.000 n=10)

          │  old.txt   │              new.txt               │
          │    B/op    │    B/op     vs base                │
TxHash-10   256.0 ± 0%   160.0 ± 0%  -37.50% (p=0.000 n=10)

          │  old.txt   │            new.txt             │
          │ allocs/op  │ allocs/op   vs base            │
TxHash-10   2.000 ± 0%   2.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

For bigger transactions, the memory savings is even greater. I performed the same test but with multiTx instead of genesisCoinbaseTx.

diff --git a/wire/bench_test.go b/wire/bench_test.go
index b9f18b83..e4cb9c88 100644
--- a/wire/bench_test.go
+++ b/wire/bench_test.go
@@ -598,7 +598,7 @@ func BenchmarkDecodeMerkleBlock(b *testing.B) {
 // transaction.
 func BenchmarkTxHash(b *testing.B) {
        for i := 0; i < b.N; i++ {
-               genesisCoinbaseTx.TxHash()
+               multiTx.TxHash()
        }
 }

Below is the benchstat for that test. We see less of a penalty for speed and see ~41% savings in memory allocated.

goos: linux
goarch: amd64
pkg: github.com/btcsuite/btcd/wire
cpu: AMD Ryzen 5 3600 6-Core Processor              
          │ old-multitx.txt │          new-multitx.txt           │
          │     sec/op      │   sec/op     vs base               │
TxHash-10       1.507µ ± 0%   1.543µ ± 2%  +2.39% (p=0.000 n=10)

          │ old-multitx.txt │          new-multitx.txt           │
          │      B/op       │    B/op     vs base                │
TxHash-10        272.0 ± 0%   160.0 ± 0%  -41.18% (p=0.000 n=10)

          │ old-multitx.txt │        new-multitx.txt         │
          │    allocs/op    │ allocs/op   vs base            │
TxHash-10        2.000 ± 0%   2.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

The benchmarks for BenchmarkDoubleHash* show a significant speedup for DoubleHashRaw(). This is likely because the other hashes have to call digest.Write() in sha256.Sum256() while the DoubleHashRaw() function gets to skip that.

[I] calvin@bitcoin ~/b/g/u/g/s/g/b/b/wire (memory-efficient-txhash) [1]> go test -run=XXX -bench=BenchmarkDoubleHash -benchmem
goos: linux
goarch: amd64
pkg: github.com/btcsuite/btcd/wire
cpu: AMD Ryzen 5 3600 6-Core Processor              
BenchmarkDoubleHashB-10          1581974               758.1 ns/op            32 B/op          1 allocs/op
BenchmarkDoubleHashH-10          1600311               750.5 ns/op             0 B/op          0 allocs/op
BenchmarkDoubleHashRaw-10        3185388               370.1 ns/op            32 B/op          1 allocs/op
PASS

I can also post some before and after ibd profiling if requested.

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 7197319343

0 of 0 changed or added relevant lines in 0 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage remained the same at 56.07%

Files with Coverage Reduction	New Missed Lines	%
peer/peer.go	1	74.52%
<!--	Total:	1		-->

Totals
Change from base Build 7168733027:	0.0%
Covered Lines:	27977
Relevant Lines:	49897

💛 - Coveralls

kcalvinalvin commented 1 year ago

The last 2 force pushes change the function signature of DoubleHashRaw to accept a closure that returns an hash. This is done as I found that the

buf := make([]byte, 0, 32)

escapes to the heap with how it was implemented in the previous commits. It ended up allocating less memory than the current master but did allocate 2 objects per hash instead of 1. This change allows the compiler to keep the buf on the stack and only do 1 allocation instead of 2.

Roasbeef commented 8 months ago

I think this just needs a rebase!

kcalvinalvin commented 8 months ago

Rebased

For ibd and block verification, I think https://github.com/btcsuite/btcd/pull/2023 is going to be better but that one's still in the works.

kcalvinalvin commented 6 months ago

Removed commits that were changing the go.mod files

Roasbeef commented 6 months ago

Will tag the new version, then make a new PR updating relevant go.mods, before a final PR that swaps this in everywhere.

btcsuite / btcd

chainhash, wire, btcutil, main: Memory efficient txhash #1978

Pull Request Test Coverage Report for Build 7197319343

💛 - Coveralls