ethereum / beacon_chain

MIT License
209 stars 65 forks source link

Block processing time estimates at scale #103

Closed djrtwo closed 5 years ago

djrtwo commented 5 years ago

Issue

In the last eth2.0 implementers call, we decided it would be worthwhile to run some timing analysis on processing blocks with real-world amounts of attestations.

It would be great to get results from at least one other client. I know not everyone has a working BLS aggregate implementation yet, but anyone that does should give this a try and report results.

Proposed Implementation

Assuming 10M eth deposited puts us at ~300k validators. With 64 slots, that is ~5000 validators per slot. With 1000 shards divided across the 64 slots, that is ~16 shards per slot.

If all of the validators coordinate and vote on the same crosslink and their attestations are aggregated and include in the next slot, then there will be 16 attestations of ~300 validators each per block. This is a good place to start.

We can then make this estimate a worse case by assuming the validators split their votes across 2, 3, 4, or even 5 different crosslink candidates. If all committees split their votes across 2 candidates, then there would be 32 attestations per block each with ~150 validators each.

EDIT My estimates on number of committees and size of committees were a bit off in practice. When using BenchmarkParams { total_validators: 312500, cycle_length: 64, shard_count: 1024, shards_per_slot: 16, validators_per_shard: 305, min_committee_size: 128 }, each slot has approximately 20 committees of size 244 (rather than 16 of ~300). This shouldn't drastically change the output, but is a better target because it reflects the actual shuffling alg (cc: @paulhauner)

EDIT2 My original assumption was correct and the spec was incorrect! Go with the original estimates

paulhauner commented 5 years ago

For the record, lighthouse is making this a priority. Thanks for the details on what to test :)

We'll come back here with any questions/comments.

djrtwo commented 5 years ago

Current python benchmarks (on standard consumer laptop with tons of tabs open and music playing)

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds crystallized_state_bytes active_state_bytes block_bytes
2 244 31250 1000000 0.1017 4056416 4494 562
16 305 312500 10000000 1.0099 40074336 7324 3392
16 3051 3125000 100000000 10.1837 400074336 12812 8880

raw csv here https://gist.github.com/djrtwo/663a031c984ef4796a9aff2ba68d03e5

Notes:

paulhauner commented 5 years ago

Here are some timings from Lighthouse (I'll add @djrtwo's first scenario once I complete it):

Computer: Lenovo X1 Carbon 5th Gen with an Intel i5-7300U @ 2.60GHz running Arch Linux with each core idling around 2-8% before tests.

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds
2 244 31250 1000000 N/A
16 305 312500 10000000 0.066773104
16 3051 3125000 100000000 0.249065226

Note: this is using an in-memory database. ~@djrtwo were you using an on-disk DB at all?~

Note: we're using concurrency for attestation validation.

djrtwo commented 5 years ago

Curious about if you see a closer 10x different when you remove the concurrency @paulhauner

paulhauner commented 5 years ago

Without concurrency:

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds
2 244 31250 1000000 N/A
16 305 312500 10000000 0.125273217
16 3051 3125000 100000000 0.450318683

That ~4x difference still holds.

In these benches I'm starting with a SSZ serialized block and then de-serializing it (and all the AttestationRecords) inside this benchmark. Are you doing the same thing @djrtwo? If not, maybe we're seeing a constant SSZ de-serialization overhead in lighthouse that we're not seeing in beacon_chain?

djrtwo commented 5 years ago

That's it. I'm not clocking the deserialization.

Both are interesting numbers. I was looking specifically for block validity and primarily at the signatures because this was our estimated bottleneck when designing the protocol.

Let's see what it is without the deserialize.

paulhauner commented 5 years ago

Presently lighthouse is structured to do "just-in-time" deserialization, where each AttestationRecord is deserialized immediately before it is verified. The idea is that if someone sends us a bad block we de-serialize at little as possible before discovering that it's bad.

I mention this for two reasons; (a) cause it's a fun fact and (b) to indicate that it'll take some amount of hacky refactoring to make them "no deserialize" tests and therefore I can get these done later today or tomorrow morning :)

On a side note; at some point it would be useful to get "bad block" benchmarks from clients. I.e., how quickly can you reject a bad block? I'm well on a tangent now, but it would also be worth considering introducing some form of entropy into the order in which AttestationRecords are verified inside a client so that there's not some "ideal resource-consuming block" that can be formed by an attacker. (E.g., make the last attestation bad and you know they'll check each one before it). Probably just maybe doing concurrency (based on # of available cores) and maybe reversing the order (based on "coin-flip") would be enough.

kaibakker commented 5 years ago

Looks great! Would performance increase when a vote wouldn't include a source as described here: https://ethresear.ch/t/should-we-simplify-casper-votes-to-remove-the-source-param/3549 ?