AztecProtocol / aztec-packages

Apache License 2.0
155 stars 157 forks source link

feat: Add opcodes flamegraph and refactor gates flamegraph #7282

Closed sirasistant closed 3 days ago

sirasistant commented 3 days ago

Adds another command to the noir profiler binary, opcodes-flamegraph, with the following usage:

noir-profiler opcodes-flamegraph -a PATH_TO_THE_ARTIFACT  -o OUTPUT_PATH

for example:

~/aztec-packages/noir/noir-repo/target/release/noir-profiler opcodes-flamegraph -a ~/aztec-packages/noir-projects/noir-protocol-circuits/target/private_kernel_inner.json  -o .

Results in a flamegraph that counts the opcodes (not the gates) for all the call stacks, like the following: inner_opcodes

This PR also refactors the gates-flamegraph and adds as the last call stack item the opcode type generated, to have more complete profiling information: image

AztecBot commented 3 days ago

Benchmark results

Metrics with a significant change:

Detailed results All benchmarks are run on txs on the `Benchmarking` contract on the repository. Each tx consists of a batch call to `create_note` and `increment_balance`, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write. This benchmark source data is available in JSON format on S3 [here](https://aztec-ci-artifacts.s3.us-east-2.amazonaws.com/benchmarks-v1/pulls/7282.json). ### Proof generation Each column represents the number of threads used in proof generation. | Metric | 1 threads | 4 threads | 16 threads | 32 threads | 64 threads | | - | - | - | - | - | - | proof_construction_time_sha256_30_ms | 12,078 (+2%) | 3,263 (+3%) | :warning: 1,718 (**+22%**) | 1,466 (+1%) | 1,487 (+1%) | proof_construction_time_sha256_100_ms | 44,887 (+2%) | 12,069 (+2%) | 5,573 (+2%) | 5,547 (+2%) | 5,476 (+2%) | proof_construction_time_poseidon_hash_ms | 79.0 (+1%) | 35.0 (+3%) | 35.0 (+3%) | 58.0 (+2%) | 88.0 | proof_construction_time_poseidon_hash_30_ms | 1,549 (+2%) | 421 (+1%) | 205 (+3%) | 239 (+7%) | 267 | proof_construction_time_poseidon_hash_100_ms | 5,852 (+2%) | 1,592 (+2%) | 724 (-1%) | 785 (+3%) | 802 (+2%) | ### L2 block published to L1 Each column represents the number of txs on an L2 block published to L1. | Metric | 4 txs | 8 txs | 16 txs | | - | - | - | - | l1_rollup_calldata_size_in_bytes | 1,412 | 1,412 | 1,412 | l1_rollup_calldata_gas | 9,476 | 9,466 | 9,476 | l1_rollup_execution_gas | 611,215 | 611,356 | 611,517 | l2_block_processing_time_in_ms | 752 (-1%) | 1,441 (+2%) | 2,712 (-1%) | l2_block_building_time_in_ms | 24,569 | 49,853 (-2%) | 96,497 | l2_block_rollup_simulation_time_in_ms | 24,569 | 49,853 (-2%) | 96,497 | l2_block_public_tx_process_time_in_ms | 21,077 | 46,141 (-2%) | 92,871 | ### L2 chain processing Each column represents the number of blocks on the L2 chain where each block has 8 txs. | Metric | 3 blocks | 5 blocks | | - | - | - | node_history_sync_time_in_ms | 7,193 (+2%) | 10,118 (+2%) | node_database_size_in_bytes | 12,255,312 | 16,142,416 | pxe_database_size_in_bytes | 16,254 | 26,813 | ### Circuits stats Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks. | Circuit | simulation_time_in_ms | witness_generation_time_in_ms | proving_time_in_ms | input_size_in_bytes | output_size_in_bytes | proof_size_in_bytes | num_public_inputs | size_in_gates | | - | - | - | - | - | - | - | - | - | private-kernel-init | 116 (-1%) | 508 (-5%) | 12,750 (-3%) | 20,634 | 67,190 | 92,352 | 2,819 | 524,288 | private-kernel-inner | 361 (-1%) | 1,083 (+8%) | 49,335 (+2%) | 94,902 | 67,190 | 92,352 | 2,819 | 2,097,152 | private-kernel-tail | 1,134 (-3%) | 2,608 (+1%) | 50,378 (-6%) | 74,513 | 71,733 | 14,912 | 399 | 2,097,152 | base-parity | 6.18 | 1,802 | 2,727 (+3%) | 128 | 64.0 | 2,208 | 2.00 | 131,072 | root-parity | 48.8 | 74.3 | 40,873 (+1%) | 27,100 | 64.0 | 2,720 | 18.0 | 2,097,152 | base-rollup | 7,756 (-1%) | 5,017 (+2%) | 92,862 (+5%) | 170,330 | 728 | 3,648 | 47.0 | 4,194,304 | root-rollup | 111 (+1%) | 92.4 (+5%) | 24,369 (+12%) | 25,253 | 620 | 3,456 | 41.0 | 1,048,576 | public-kernel-setup | 637 | 3,839 (+4%) | 45,326 (+2%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | public-kernel-app-logic | 588 (-3%) | 4,874 (+5%) | 49,289 (+9%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | public-kernel-tail | 1,392 (-1%) | 41,284 (+12%) | 193,183 (+2%) | 511,910 | 10,014 | 14,912 | 399 | 8,388,608 | private-kernel-reset-small | 552 (-2%) | 2,005 (+3%) | 48,828 (+3%) | 123,313 | 67,190 | 92,352 | 2,819 | 2,097,152 | public-kernel-teardown | 576 (+2%) | 4,931 (+4%) | 49,280 (+6%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | merge-rollup | 29.4 | N/A | N/A | 16,486 | 728 | N/A | N/A | N/A | private-kernel-tail-to-public | N/A | 9,377 | 53,281 | N/A | N/A | 125,344 | 3,850 | 2,097,152 | Stats on running time collected for app circuits | Function | input_size_in_bytes | output_size_in_bytes | witness_generation_time_in_ms | proof_size_in_bytes | proving_time_in_ms | size_in_gates | num_public_inputs | | - | - | - | - | - | - | - | - | ContractClassRegisterer:register | 1,344 | 9,944 | 410 | N/A | N/A | N/A | N/A | ContractInstanceDeployer:deploy | 1,408 | 9,944 | 39.8 | N/A | N/A | N/A | N/A | MultiCallEntrypoint:entrypoint | 1,920 | 9,944 | 1,304 | N/A | N/A | N/A | N/A | GasToken:deploy | 1,376 | 9,944 | 955 | N/A | N/A | N/A | N/A | SchnorrAccount:constructor | 1,312 | 9,944 | 490 | N/A | N/A | N/A | N/A | SchnorrAccount:entrypoint | 2,304 | 9,944 | 1,896 (+1%) | 16,768 | 55,805 | 2,097,152 | 457 | Token:privately_mint_private_note | 1,280 | 9,944 | 640 (-1%) | N/A | N/A | N/A | N/A | FPC:fee_entrypoint_public | 1,344 | 9,944 | 301 (-2%) | 16,768 | 11,870 (+3%) | 524,288 | 457 | Token:transfer | 1,312 | 9,944 | 2,908 | 16,768 | 23,101 (-4%) | 1,048,576 | 457 | AuthRegistry:set_authorized (avm) | 20,954 | N/A | N/A | 94,336 | 1,403 (+13%) | N/A | N/A | FPC:prepare_fee (avm) | 28,396 | N/A | N/A | 94,400 | 3,136 (+10%) | N/A | N/A | Token:transfer_public (avm) | 44,612 | N/A | N/A | 94,400 | 4,234 (+13%) | N/A | N/A | AuthRegistry:consume (avm) | 34,832 | N/A | N/A | 94,336 | 3,084 (+11%) | N/A | N/A | FPC:pay_refund (avm) | 38,561 | N/A | N/A | 94,368 | 23,881 (+4%) | N/A | N/A | Benchmarking:create_note | 1,344 | 9,944 | 484 | N/A | N/A | N/A | N/A | SchnorrAccount:verify_private_authwit | 1,280 | 9,944 | 82.1 (+14%) | N/A | N/A | N/A | N/A | Token:unshield | 1,376 | 9,944 | 2,774 (+3%) | N/A | N/A | N/A | N/A | FPC:fee_entrypoint_private | 1,376 | 9,944 | 3,595 (+2%) | N/A | N/A | N/A | N/A | ### AVM Simulation Time to simulate various public functions in the AVM. | Function | time_ms | bytecode_size_in_bytes | | - | - | - | GasToken:_increase_public_balance | 69.9 | 13,790 | GasToken:set_portal | 17.9 (+1%) | 3,305 (-1%) | Token:constructor | 104 (+5%) | 23,658 | FPC:constructor | 65.1 (+6%) | 13,592 | GasToken:mint_public | 53.1 (+3%) | 10,158 | Token:mint_public | :warning: 62.5 (**-89%**) | 19,000 | Token:assert_minter_and_mint | :warning: 222 (**+296%**) | 12,891 | AuthRegistry:set_authorized | 31.8 (-16%) | 7,812 | FPC:prepare_fee | 134 (-14%) | 15,062 | Token:transfer_public | :warning: 32.6 (**-40%**) | 31,184 | FPC:pay_refund | 154 (-3%) | 25,260 | Benchmarking:increment_balance | 2,626 (-2%) | 15,233 | Token:_increase_public_balance | 73.9 (+29%) | 14,972 | FPC:pay_refund_with_shielded_rebate | 133 (+4%) | 26,347 | ### Public DB Access Time to access various public DBs. | Function | time_ms | | - | - | get-nullifier-index | 0.162 (+3%) | ### Tree insertion stats The duration to insert a fixed batch of leaves into each tree type. | Metric | 1 leaves | 16 leaves | 64 leaves | 128 leaves | 256 leaves | 512 leaves | 1024 leaves | | - | - | - | - | - | - | - | - | batch_insert_into_append_only_tree_16_depth_ms | 10.4 (+1%) | 16.8 (+1%) | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_16_depth_hash_count | 16.8 | 31.7 | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_16_depth_hash_ms | 0.602 (+1%) | 0.516 (+1%) | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_32_depth_ms | N/A | N/A | 48.8 (+1%) | 76.5 (+1%) | 131 | 250 (+2%) | 470 | batch_insert_into_append_only_tree_32_depth_hash_count | N/A | N/A | 95.9 | 159 | 287 | 543 | 1,055 | batch_insert_into_append_only_tree_32_depth_hash_ms | N/A | N/A | 0.498 (+1%) | 0.470 (+1%) | 0.450 | 0.453 (+2%) | 0.439 | batch_insert_into_indexed_tree_20_depth_ms | N/A | N/A | 59.9 (+1%) | 113 (+1%) | 183 | 359 (+2%) | 693 | batch_insert_into_indexed_tree_20_depth_hash_count | N/A | N/A | 109 | 207 | 355 | 691 | 1,363 | batch_insert_into_indexed_tree_20_depth_hash_ms | N/A | N/A | 0.506 (+1%) | 0.505 (+1%) | 0.484 | 0.486 (+2%) | 0.476 | batch_insert_into_indexed_tree_40_depth_ms | N/A | N/A | 73.4 (+1%) | N/A | N/A | N/A | N/A | batch_insert_into_indexed_tree_40_depth_hash_count | N/A | N/A | 133 | N/A | N/A | N/A | N/A | batch_insert_into_indexed_tree_40_depth_hash_ms | N/A | N/A | 0.522 (+1%) | N/A | N/A | N/A | N/A | ### Miscellaneous Transaction sizes based on how many contract classes are registered in the tx. | Metric | 0 registered classes | 1 registered classes | | - | - | - | tx_size_in_bytes | 85,707 | 670,983 | Transaction size based on fee payment method | Metric | | | - | |