ChainSafe / ssz

Typescript implementation of Simple Serialize (SSZ)
https://simpleserialize.com/
Other
44 stars 17 forks source link

Investigating Tree Hashing in Lodestar #355

Open wemeetagain opened 3 months ago

wemeetagain commented 3 months ago

Current Hashing Approach

Lodestar's architecture relies heavily on maintaining a full merkle tree of the beacon state. We represent the tree as a linked data structure, where each node is 1. immutable, 2. lazily computing but caching the hash of its children.

This allows us to minimize the number of hashes we need to perform during state transition. Since all hashes of a prestate are maintained, only the paths thru the tree to the "diff" need to be re-hashed. This reduces the computational cost of hashing.

Also, it allows us perform structural sharing, sharing the memory for beacon states with shared subtrees. For example, between epochs in a sync period, where sync committees remain constant, if we maintain a reference to two beacon states, both states will share the same underlying subtrees for the sync committees. This reduces the memory cost of maintaining several related states.

Hashing Function

We use as-sha256 with several optimizations

Hash Cache Representation

The memory usage for each type of object in Javascript is not very efficient coming from a systems language intuition. We store hash objects NOT as 32-byte Uint8Array. A 32-byte Uint8Array takes 223 total bytes! There's a bunch of pointers and additional bookkeeping that's being stored behind the scenes. We store hash objects as objects with 8 uint32 numbers. eg: {h0: 0, h1: 0, ..., h7: 0}. This takes somewhere between 88 bytes and 216 bytes, depending on the sizes of the indiviudal component numbers. Smaller numbers are represented as Smi (small integer) as an immediate value, while larger numbers are stored on the heap. In practice, this happens TODO.

How to improve?

The hashing speed in lodestar is quite low compared to other implementations.

The memory of our hashes in a lodestar node constitute a lot of a running beacon node. And our memory usage, measured per-hash, is still very large compared to systems languages.

Results

Some experiments were made, results below

cayman/hash-object

node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/hash.test.ts

    ✓ hashing - hashtreeOne                                               1643.022 ops/s    608.6346 us/op        -         45 runs   3.24 s
    ✓ hashing - hashtree                                                  1564.410 ops/s    639.2188 us/op        -         25 runs   2.13 s
    ✓ hashing - hashtree uint8array                                       46076.71 ops/s    21.70294 us/op        -        468 runs   1.52 s
    ✓ hashing - rust                                                      1789.679 ops/s    558.7596 us/op        -         27 runs   2.03 s
    ✓ hashing - rust object rs-sha256                                     1438.692 ops/s    695.0758 us/op        -         21 runs   2.01 s
    ✓ hashing - rust object as-sha256                                     550.4754 ops/s    1.816612 ms/op        -          7 runs   1.87 s
    ✓ hashing - as-sha256                                                 2276.654 ops/s    439.2412 us/op        -         10 runs  0.959 s

Unfortunately, a rust HashObject is more expensive, memory-wise, than the status quo.

node -r ts-node/register --expose-gc packages/ssz/test/memory/hash.test.ts

hash-object - rust - 1                   - 101.7 bytes / instance
hash-object - js - 1                     - 91.8 bytes / instance

cayman/hash-cache

node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/eth2/hashTreeRoot.test.ts

hash cache

    ✓ hashTreeRoot Attestation - struct                                   37929.07 ops/s    26.36500 us/op        -      16753 runs  0.505 s
    ✓ hashTreeRoot Attestation - tree                                     46307.02 ops/s    21.59500 us/op        -      22688 runs  0.912 s
    ✓ hashTreeRoot SignedAggregateAndProof - struct                       25187.65 ops/s    39.70200 us/op        -      13416 runs  0.606 s
    ✓ hashTreeRoot SignedAggregateAndProof - tree                         30281.92 ops/s    33.02300 us/op        -      14954 runs   1.01 s
    ✓ hashTreeRoot SyncCommitteeMessage - struct                          96609.02 ops/s    10.35100 us/op        -      72446 runs  0.808 s
    ✓ hashTreeRoot SyncCommitteeMessage - tree                            136072.9 ops/s    7.349000 us/op        -      43560 runs  0.611 s
    ✓ hashTreeRoot SignedContributionAndProof - struct                    35928.57 ops/s    27.83300 us/op        -      26315 runs  0.808 s
    ✓ hashTreeRoot SignedContributionAndProof - tree                      45607.95 ops/s    21.92600 us/op        -      12233 runs  0.505 s
    ✓ hashTreeRoot SignedBeaconBlock - struct                             374.1080 ops/s    2.673025 ms/op        -        223 runs   1.13 s
    ✓ hashTreeRoot SignedBeaconBlock - tree                               510.8943 ops/s    1.957352 ms/op        -         82 runs   1.42 s
    ✓ hashTreeRoot Validator - struct                                     58159.82 ops/s    17.19400 us/op        -      43131 runs  0.808 s
    ✓ hashTreeRoot Validator - tree                                       54773.51 ops/s    18.25700 us/op        -      54961 runs   1.11 s

master

    ✓ hashTreeRoot Attestation - struct                                   37378.99 ops/s    26.75300 us/op        -      16472 runs  0.505 s
    ✓ hashTreeRoot Attestation - tree                                     48546.05 ops/s    20.59900 us/op        -      14022 runs  0.404 s
    ✓ hashTreeRoot SignedAggregateAndProof - struct                       24892.96 ops/s    40.17200 us/op        -      11015 runs  0.505 s
    ✓ hashTreeRoot SignedAggregateAndProof - tree                         32836.41 ops/s    30.45400 us/op        -      10084 runs  0.404 s
    ✓ hashTreeRoot SyncCommitteeMessage - struct                          102343.7 ops/s    9.771000 us/op        -      56987 runs  0.606 s
    ✓ hashTreeRoot SyncCommitteeMessage - tree                            148104.3 ops/s    6.752000 us/op        -      42104 runs  0.404 s
    ✓ hashTreeRoot SignedContributionAndProof - struct                    36937.17 ops/s    27.07300 us/op        -       9497 runs  0.303 s
    ✓ hashTreeRoot SignedContributionAndProof - tree                      47938.64 ops/s    20.86000 us/op        -      11127 runs  0.303 s
    ✓ hashTreeRoot SignedBeaconBlock - struct                             383.4154 ops/s    2.608137 ms/op        -        268 runs   1.24 s
    ✓ hashTreeRoot SignedBeaconBlock - tree                               530.1876 ops/s    1.886125 ms/op        -        283 runs   1.20 s
    ✓ hashTreeRoot Validator - struct                                     59245.22 ops/s    16.87900 us/op        -      78104 runs   1.41 s
    ✓ hashTreeRoot Validator - tree                                       85623.77 ops/s    11.67900 us/op        -      55251 runs  0.707 s

node -r ts-node/register --expose-gc packages/ssz/test/memory/eth2Objects.test.ts

hash cache

Attestation struct             - 1421.0 bytes / instance
Attestation tree               - 5272.3 bytes / instance
SignedAggregateAndProof struct - 2077.4 bytes / instance
SignedAggregateAndProof tree   - 8099.1 bytes / instance
AggregationBits struct         - 255.9 bytes / instance
AggregationBits tree           - 1383.0 bytes / instance
SignedBeaconBlockPhase0 struct - 131438.2 bytes / instance
SignedBeaconBlockPhase0 tree   - 484786.4 bytes / instance

master

Attestation struct             - 1421.2 bytes / instance
Attestation tree               - 3028.2 bytes / instance
SignedAggregateAndProof struct - 2077.7 bytes / instance
SignedAggregateAndProof tree   - 4679.8 bytes / instance
AggregationBits struct         - 265.5 bytes / instance
AggregationBits tree           - 941.2 bytes / instance
SignedBeaconBlockPhase0 struct - 131435.5 bytes / instance
SignedBeaconBlockPhase0 tree   - 277766.9 bytes / instance

cayman/napi-merkle-node

twoeths commented 2 months ago

an interesting talk about sha256 for merkle tree regarding batch hash https://www.youtube.com/watch?v=NfK4np15E64

there are 2 implementations for now:

There are 2 ways we can do the batch hash

getHashComputation(level: number, hashCompsByLevel: Map<number, HashComputation[]>): void { if (this.h0 === null) { let hashComputations = hashCompsByLevel.get(level); if (hashComputations === undefined) { hashComputations = []; hashCompsByLevel.set(level, hashComputations); } hashComputations.push({src0: this.left, src1: this.right, dest: this}); if (!this.left.isLeaf()) { (this.left as BranchNode).getHashComputation(level + 1, hashCompsByLevel); } if (!this.right.isLeaf()) { (this.right as BranchNode).getHashComputation(level + 1, hashCompsByLevel); }

  return;
}

// else stop the recursion, LeafNode should have h0

}



this traversal takes <10% of our current hash time

- or we can do it during the `ViewDU.commit()` which has the rebind inside. Need to take a closer look at this but this approach is more potential because we don't waste another traversal there
twoeths commented 1 week ago

in progress POC to implement batch hash in ssz/lodestar https://hackmd.io/zj9N5RIqQfCqYz8Y1Xc_hA?view

twoeths commented 1 week ago

with this branch te/hashtree_hasher https://github.com/ChainSafe/lodestar/blob/63d3b6a022527f9ee202cf76aaaa778fadbb8316/packages/cli/src/hashtree.ts#L18

it consumes ~1GB more heap memory in lodestar

Screenshot 2024-06-26 at 09 42 55

cc @wemeetagain @matthewkeil