v1.2.0: investigate fromBytes performance issue

ChainSafe / lodestar

🌟 TypeScript Implementation of Ethereum Consensus

https://lodestar.chainsafe.io

Apache License 2.0

1.18k stars 289 forks source link

v1.2.0: investigate fromBytes performance issue #4690

Closed twoeths closed 1 year ago

twoeths commented 2 years ago

Describe the bug

With v1.2.0, on a node of 1000 keys, fromBytes() takes more than 8% of cpu time

1025_lg1k_chacha20poly1305_no_mem_alloc.cpuprofile.zip

Expected behavior

In v1.1.1, it takes only around 1% with the same type of node

0525_lg1k_v1.1.1.cpuprofile.zip

dapplion commented 2 years ago

Unrelated but important, noticed that on 0525_lg1k_v1.1.1.cpuprofile.zip afterProcessEpoch is run 3 times between 250500 ms and 255500 ms (~5s range)

Screenshot from 2022-10-25 21-21-59

twoeths commented 2 years ago

v1.2.0 process more attestation, looking into the number of aggregateAttestationInto calls it's 3.3x compared to v1.1

Screen Shot 2022-10-26 at 11 42 17

v1.1.0

in both nodes it receives the same number of valid attestations per second

so it could be v1.2.0 receives more attestations in the last 3 slots that causes us to do the preaggregation 3x times

dapplion commented 2 years ago

@tuyennhv I've been looking into how to improve this:

We can not cache "de-serialized" signatures since they are sent to the Workers parsed
The only trade-off we can do I think is to aggregate the signatures lazily at aggregation time. WIP at https://github.com/ChainSafe/lodestar/compare/dapplion/aggregate-pool-fromBytes

twoeths commented 2 years ago

@tuyennhv I've been looking into how to improve this:

We can not cache "de-serialized" signatures since they are sent to the Workers parsed

The only trade-off we can do I think is to aggregate the signatures lazily at aggregation time. WIP at https://github.com/ChainSafe/lodestar/compare/dapplion/aggregate-pool-fromBytes

this could help improve the I/O lag issue a bit as doing preaggregate at 0 - 1/3 of slot would make us more busy, let's try doing the aggregation at 2/3 of slot 👍

twoeths commented 2 years ago

if we use multistream-select 3.1.1, seems like the p2p are improved and we receive so many attestations which cause fromBytes to take 22% of cpu time

0111_multi_stream_select_3.1.1_lg1k.cpuprofile.zip

maybe attestations are aggregated too much which cause fromBytes to run so frequently and our peers are out of mesh then aggregated rates are dropped

dapplion commented 1 year ago

@tuyennhv Closing for now as https://github.com/ChainSafe/lodestar/pull/4838 should make fromBytes calls much cheaper. Once you do another CPU profile in the future please confirm