Closed laser closed 5 years ago
The data points below can currently be collected using two different scripts
filbase --benchy
collects information on ZigZag (rust-filbase)cargo run --bin micro -p fil-proofs-tooling --release
collects the micro benchmarks (rust-fil-proofs) The current solution can store the information into prometheus. This might or might not be a good idea. An alternative that is possible much simpler and more flexible in the long run is to store the data into a sql or nosql database.
Fr
.From rust-filbase
root:
cargo build --release --features benchy \
&& ./target/release/filbase benchy zigzag --size 1024
Replication: total time: 14.9740s
Replication: time per byte: 14.2800us
Vanilla proving: 533.4730us
Avg verifying: 0.3421s
Total proving: 0.0000s
# HELP circuit_num_constraints Number of constraints of the circuit
# TYPE circuit_num_constraints gauge
circuit_num_constraints{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 0
# HELP circuit_num_inputs Number of inputs to the circuit
# TYPE circuit_num_inputs gauge
circuit_num_inputs{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 0
# HELP replication_time_ms Total replication timea
# TYPE replication_time_ms gauge
replication_time_ms{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 14974
# HELP replication_time_ns_per_byte Replication time per byte
# TYPE replication_time_ns_per_byte gauge
replication_time_ns_per_byte{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 14280
# HELP vanilla_proving_time_us Vanilla proving time
# TYPE vanilla_proving_time_us gauge
vanilla_proving_time_us{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 533
# HELP vanilla_verification_time_us Vanilla verification time
# TYPE vanilla_verification_time_us gauge
vanilla_verification_time_us{data_size_bytes="1048576",expansion_degree="8",hasher="pedersen",layers="10",m="5",partitions="1",samples="5",sloth_iter="0"} 324979
From rust-proofs
root:
cargo build --release --all \
&& cargo run --bin micro -p fil-proofs-tooling --release
# HELP time_gauge_us time gauge help
# TYPE time_gauge_us gauge
time_gauge_us{name="bytes-32-to-fr"} 0.049158
time_gauge_us{name="encode-node/blake2s/10"} 0.86665
time_gauge_us{name="encode-node/blake2s/3"} 0.53398
time_gauge_us{name="encode-node/blake2s/5"} 0.59715
time_gauge_us{name="encode-node/pedersen/10"} 0.84397
time_gauge_us{name="encode-node/pedersen/3"} 0.38849
time_gauge_us{name="encode-node/pedersen/5"} 0.48761
time_gauge_us{name="encode-node/sha256/10"} 0.8845599999999999
time_gauge_us{name="encode-node/sha256/3"} 0.39442
time_gauge_us{name="encode-node/sha256/5"} 0.50918
time_gauge_us{name="fr-to-bytes-32"} 0.092348
time_gauge_us{name="hash-blake2s-circuit/create-proof"} 307720
time_gauge_us{name="hash-blake2s-circuit/synthesize"} 0.58429
time_gauge_us{name="hash-blake2s/non-circuit/32"} 0.13493
time_gauge_us{name="hash-blake2s/non-circuit/320"} 0.47986
time_gauge_us{name="hash-blake2s/non-circuit/64"} 0.12398
time_gauge_us{name="hash-pedersen-circuit/create-proof"} 37799
time_gauge_us{name="hash-pedersen-circuit/synthesize"} 1434.7
time_gauge_us{name="hash-pedersen/non-circuit/32"} 18.722
time_gauge_us{name="hash-pedersen/non-circuit/320"} 397.31
time_gauge_us{name="hash-pedersen/non-circuit/64"} 34.457
time_gauge_us{name="hash-sha256-circuit/create-proof"} 288300
time_gauge_us{name="hash-sha256-circuit/synthesize"} 30697
time_gauge_us{name="hash-sha256/non-circuit/32"} 0.34006000000000003
time_gauge_us{name="hash-sha256/non-circuit/320"} 1.6986
time_gauge_us{name="hash-sha256/non-circuit/64"} 0.61475
time_gauge_us{name="kdf/blake2s/10"} 0.75461
time_gauge_us{name="kdf/blake2s/3"} 0.30211
time_gauge_us{name="kdf/blake2s/5"} 0.41220999999999997
time_gauge_us{name="kdf/pedersen/10"} 0.7060700000000001
time_gauge_us{name="kdf/pedersen/3"} 0.25906999999999997
time_gauge_us{name="kdf/pedersen/5"} 0.38075
time_gauge_us{name="kdf/sha256/10"} 0.8754500000000001
time_gauge_us{name="kdf/sha256/3"} 0.38471
time_gauge_us{name="kdf/sha256/5"} 0.5203099999999999
time_gauge_us{name="merkletree/blake2s/1024"} 413.28
time_gauge_us{name="merkletree/blake2s/128"} 146.95
time_gauge_us{name="merkletree/pedersen/1024"} 17903
time_gauge_us{name="merkletree/pedersen/128"} 2210.7000000000003
time_gauge_us{name="parents in a loop/Blake2s/10"} 124.61
time_gauge_us{name="parents in a loop/Blake2s/1000"} 11443
time_gauge_us{name="parents in a loop/Blake2s/50"} 517.15
time_gauge_us{name="parents in a loop/Pedersen/10"} 144.03
time_gauge_us{name="parents in a loop/Pedersen/1000"} 9291.800000000001
time_gauge_us{name="parents in a loop/Pedersen/50"} 577.53
time_gauge_us{name="parents in a loop/Sha256/10"} 160.39
time_gauge_us{name="parents in a loop/Sha256/1000"} 8450.1
time_gauge_us{name="parents in a loop/Sha256/50"} 611.89
time_gauge_us{name="preprocessing/write_padded + unpadded/1024000"} 18303
time_gauge_us{name="preprocessing/write_padded + unpadded/128"} 465.47
time_gauge_us{name="preprocessing/write_padded + unpadded/2048000"} 31778
time_gauge_us{name="preprocessing/write_padded + unpadded/256"} 463.9
time_gauge_us{name="preprocessing/write_padded + unpadded/256000"} 5998.2
time_gauge_us{name="preprocessing/write_padded + unpadded/512"} 419.26
time_gauge_us{name="preprocessing/write_padded + unpadded/512000"} 9211.2
time_gauge_us{name="preprocessing/write_padded/1024000"} 6636.3
time_gauge_us{name="preprocessing/write_padded/128"} 230.69
time_gauge_us{name="preprocessing/write_padded/2048000"} 12681
time_gauge_us{name="preprocessing/write_padded/256"} 233.09
time_gauge_us{name="preprocessing/write_padded/256000"} 1849.1
time_gauge_us{name="preprocessing/write_padded/512"} 238.95
time_gauge_us{name="preprocessing/write_padded/512000"} 3240.6000000000004
time_gauge_us{name="sloth/decode-circuit-create_proof"} 5531.799999999999
time_gauge_us{name="sloth/decode-circuit-synthesize_circuit"} 1.3837
time_gauge_us{name="sloth/decode-non-circuit"} 0.005585
time_gauge_us{name="sloth/encode-non-circuit"} 0.004937400000000001
time_gauge_us{name="xor-circuit/create-proof"} 20208
time_gauge_us{name="xor-circuit/synthesize"} 490.56
time_gauge_us{name="xor/non-circuit/32"} 0.3122
time_gauge_us{name="xor/non-circuit/320"} 2.4207
time_gauge_us{name="xor/non-circuit/64"} 0.52199
I'm going to put diffs to the list above here. I will update this comment over time. @laser @dignifiedquire
In general, we may need a name-negotiation pass. I'm not going to fixate on getting all naming perfect first.
Not needed:
sloth_iter
is not needed. Sloth is dead.Needed:
wall-clock-sealing-time
: total time (end - start) for sealing, disregarding CPU.vector-commitment-time
: should be CPU time, not wall-clock time.max-memory
: i.e. max resident set size of the process throughout its life.layer-challenges
: how many challenges were actually performed on each layer (conceptually: tuples of layer-index, challenge count).vector-commitment-parallelism
: how many cores were used for vector commitment (merkle tree) generation circuit-proving-parallelism
: how many cores were used for circuit provingWe can probably just use the CPU's core count for the parallelism
numbers above — although that's not quite right. For example, since we parallelize replication and merkle-tree generation, the tree generation (except the final tree) can't use all cores. So aspirationally, we should capture this accurate, even if we don't initially.
For hash-function microbenchmarks, we also need circuit information:
circuit-time
: combined synthesis and circuit proving time attributable to the functionnum-constraints
: number of constraints used in circuitWe need to capture some configuration — for example whether or not MAXIMIZE_CACHING
is true or not. We will also need to be able to control values of such configuration when running benchmarks. This will matter more if/when configuration complexity increases. Another such value (not yet present in configuration) is the pedersen hashing window size (see #736).
Wherever 'cycles' appears, we mean 'pseudocycles', which is time / clock speed. So, for example, 1 second at 1GHz would be 1B pseudocycles. The idea is to get a quantity which can be used at least somewhat meaningfully to compare performance on different machines. It's not intended to measure actual processor cycles.
For initial work, it's probably easiest to ignore these numbers and instead report everything in seconds. As long as we also have the clock speed of the processor (which should be captured), we can calculate. NOTE: this will get more complicated if/when we introduce GPU to the timings.
@dignifiedquire - I have moved on to some of the infra/ops stuff (getting benchmarks running on master
build, queuing benchmarks on Packet, etc.). I'm going to assign this story to you since you're going to be adding additional output (e.g. circuit stuff).
even though not everything got done, it seems the core issues are resolved, closing
Description
Notes:
Acceptance criteria
master
Risks + pitfalls
Where to begin