feat(avm): calldata gadget preliminaries

jeanmon commented 3 days ago

First preliminary work for issue #7211. Added a new public calldata column and passes calldata file to the avm verifier.

jeanmon commented 3 days ago

I think we shouldn't separately pass the calldata on the TS side. You might want to prepend it to the VK like we do with the other public inputs:

https://github.com/AztecProtocol/aztec-packages/blob/3cd4d60be9525ead3cbaeb10e2ae3c1d4d1648d6/barretenberg/cpp/src/barretenberg/vm/avm_trace/avm_execution.cpp#L82

That might be wrong, or a hack, but then it's probably still better to solve them all in a final way in a separate PR (once it's agreed how we are going to pass things, and what we are going to pass).

@fcarreiro I was trying this as a first attempt but then realized that it is actually more complex for the time being to pursue this route. We have already in the code some separate calldata, returndata structures. Furthermore, the current serialization mechanism in Proof and HonkProof class do not support to pass variable-length (vector) data structure. I could hack it for the callata but as soon as we want to add returndata it will not work. Not sure how easy it is to change this because we are inheriting from some BB classes. Bottom line, if as you said we are not sure how we want to pass calldata, I would rather take the easier path.

fcarreiro commented 3 days ago

I think we shouldn't separately pass the calldata on the TS side. You might want to prepend it to the VK like we do with the other public inputs:

https://github.com/AztecProtocol/aztec-packages/blob/3cd4d60be9525ead3cbaeb10e2ae3c1d4d1648d6/barretenberg/cpp/src/barretenberg/vm/avm_trace/avm_execution.cpp#L82

That might be wrong, or a hack, but then it's probably still better to solve them all in a final way in a separate PR (once it's agreed how we are going to pass things, and what we are going to pass).

@fcarreiro I was trying this as a first attempt but then realized that it is actually more complex for the time being to pursue this route. We have already in the code some separate calldata, returndata structures. Furthermore, the current serialization mechanism in Proof and HonkProof class do not support to pass variable-length (vector) data structure. I could hack it for the callata but as soon as we want to add returndata it will not work. Not sure how easy it is to change this because we are inheriting from some BB classes. Bottom line, if as you said we are not sure how we want to pass calldata, I would rather take the easier path.

The path in this PR might seem easier but it adds a second way of doing things, and it breaks abstractions so I'm rather against it.

It also is not easier. HonkProof is just a vector, given a strange name by our BB friends. It is true that the public inputs are fixed size so you don't need to encode the length, but it's not too difficult to do this, and it keeps things encapsulated in avm_execution.cpp. It can be done like this in prove

    std::vector<FF> calldata;
    HonkProof proof;
    auto raw_proof = prover.construct_proof();
    proof.insert(proof.end(), raw_proof.begin(), raw_proof.end());
    proof.push_back(calldata.size());
    proof.insert(proof.end(), calldata.begin(), calldata.end());
    return std::make_tuple(*verifier.key, proof);

and like this in verify

    std::vector<FF> public_inputs_vec;
    std::vector<FF> calldata;
    std::vector<FF> raw_proof;

    // This can be made nicer using BB's serialize::read, probably.
    const auto public_inputs_offset = proof.begin();
    const auto public_inputs_size = PUBLIC_CIRCUIT_PUBLIC_INPUTS_LENGTH;
    const auto calldata_size_offset = public_inputs_offset + public_inputs_size;
    const auto calldata_offset = calldata_size_offset + 1;
    const auto raw_proof_offset = calldata_offset + static_cast<size_t>(proof.at(calldata_size_offset));

    std::copy(public_inputs_offset, public_inputs_offset + public_inputs_size, std::back_inserter(public_inputs_vec));
    std::copy(calldata_offset, calldata_offset + calldata_size, std::back_inserter(calldata));
    std::copy(raw_proof_offset, proof.end(), std::back_inserter(raw_proof));

AztecBot commented 2 days ago

Benchmark results

Metrics with a significant change:

avm_simulation_time_ms (Token:mint_public): 71.1 (-88%)
avm_simulation_time_ms (Token:assert_minter_and_mint): 231 (+284%)
avm_simulation_time_ms (FPC:prepare_fee): 204 (+33%)
avm_simulation_time_ms (Token:transfer_public): 35.1 (-30%)
protocol_circuit_witness_generation_time_in_ms (private-kernel-tail-to-public): 6,725 (-18%)

Detailed results

All benchmarks are run on txs on the `Benchmarking` contract on the repository. Each tx consists of a batch call to `create_note` and `increment_balance`, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write. This benchmark source data is available in JSON format on S3 [here](https://aztec-ci-artifacts.s3.us-east-2.amazonaws.com/benchmarks-v1/pulls/7227.json). ### Proof generation Each column represents the number of threads used in proof generation. | Metric | 1 threads | 4 threads | 16 threads | 32 threads | 64 threads | | - | - | - | - | - | - | proof_construction_time_sha256_30_ms | 11,398 | 3,049 | 1,485 (+1%) | 1,602 (-1%) | 1,533 | proof_construction_time_sha256_100_ms | 43,754 | 11,729 | 5,422 (-1%) | 5,384 (-1%) | 5,347 | proof_construction_time_poseidon_hash_ms | 78.0 (+1%) | 34.0 | 34.0 | 58.0 | 87.0 (-1%) | proof_construction_time_poseidon_hash_30_ms | 1,516 | 414 | 201 | 224 (-1%) | 269 (+1%) | proof_construction_time_poseidon_hash_100_ms | 5,747 | 1,562 | 721 | 779 (+2%) | 797 | ### L2 block published to L1 Each column represents the number of txs on an L2 block published to L1. | Metric | 4 txs | 8 txs | 16 txs | | - | - | - | - | l1_rollup_calldata_size_in_bytes | 1,412 | 1,412 | 1,412 | l1_rollup_calldata_gas | 9,452 | 9,472 | 9,464 | l1_rollup_execution_gas | 610,273 | 610,293 | 610,285 | l2_block_processing_time_in_ms | 752 (-1%) | 1,412 | 2,696 (-1%) | l2_block_building_time_in_ms | 25,808 | 51,503 | 101,538 (+1%) | l2_block_rollup_simulation_time_in_ms | 25,808 | 51,502 | 101,537 (+1%) | l2_block_public_tx_process_time_in_ms | 22,173 | 47,678 | 97,749 (+1%) | ### L2 chain processing Each column represents the number of blocks on the L2 chain where each block has 8 txs. | Metric | 3 blocks | 5 blocks | | - | - | - | node_history_sync_time_in_ms | 7,044 (-1%) | 9,933 (+1%) | node_database_size_in_bytes | 12,152,912 | 16,080,976 | pxe_database_size_in_bytes | 16,254 | 26,813 | ### Circuits stats Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks. | Circuit | simulation_time_in_ms | witness_generation_time_in_ms | proving_time_in_ms | input_size_in_bytes | output_size_in_bytes | proof_size_in_bytes | num_public_inputs | size_in_gates | | - | - | - | - | - | - | - | - | - | private-kernel-init | 123 (+2%) | 467 (-15%) | 12,904 (+3%) | 20,634 | 67,190 | 92,352 | 2,819 | 524,288 | private-kernel-inner | 381 (+1%) | 1,009 (+1%) | 51,886 (+5%) | 94,902 | 67,190 | 92,352 | 2,819 | 2,097,152 | private-kernel-tail | 313 (+1%) | 1,779 (-1%) | 55,822 (+5%) | 99,121 | 71,733 | 14,912 | 399 | 2,097,152 | base-parity | 6.50 (-1%) | 1,970 | 2,800 (+1%) | 128 | 64.0 | 2,208 | 2.00 | 131,072 | root-parity | 49.5 (-1%) | 73.6 (+4%) | 39,888 (-6%) | 27,100 | 64.0 | 2,720 | 18.0 | 2,097,152 | base-rollup | 8,023 (+1%) | 5,053 | 87,346 (+1%) | 170,330 | 756 | 3,648 | 47.0 | 4,194,304 | root-rollup | 110 | 87.2 | 23,634 (+7%) | 25,309 | 620 | 3,456 | 41.0 | 1,048,576 | public-kernel-setup | 715 (-1%) | 3,827 (+2%) | 46,225 (+1%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | public-kernel-app-logic | 637 (+1%) | 4,985 (+5%) | 45,669 (-4%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | public-kernel-tail | 1,411 | 38,471 (+11%) | 200,228 (+1%) | 511,910 | 10,014 | 14,912 | 399 | 8,388,608 | private-kernel-reset-small | 564 (+1%) | 2,213 (+3%) | 47,505 (+1%) | 123,313 | 67,190 | 92,352 | 2,819 | 2,097,152 | public-kernel-teardown | 627 (-1%) | 4,826 | 46,308 (-2%) | 116,905 | 93,334 | 125,344 | 3,850 | 2,097,152 | merge-rollup | 29.5 | N/A | N/A | 16,542 | 756 | N/A | N/A | N/A | private-kernel-tail-to-public | N/A | :warning: 6,725 (**-18%**) | 104,344 (+6%) | N/A | N/A | 125,344 | 3,850 | 4,194,304 | Stats on running time collected for app circuits | Function | input_size_in_bytes | output_size_in_bytes | witness_generation_time_in_ms | proof_size_in_bytes | proving_time_in_ms | size_in_gates | num_public_inputs | | - | - | - | - | - | - | - | - | ContractClassRegisterer:register | 1,344 | 9,944 | 425 | N/A | N/A | N/A | N/A | ContractInstanceDeployer:deploy | 1,408 | 9,944 | 39.8 | N/A | N/A | N/A | N/A | MultiCallEntrypoint:entrypoint | 1,920 | 9,944 | 1,787 (-1%) | N/A | N/A | N/A | N/A | GasToken:deploy | 1,376 | 9,944 | 984 (-1%) | N/A | N/A | N/A | N/A | SchnorrAccount:constructor | 1,312 | 9,944 | 1,411 | N/A | N/A | N/A | N/A | SchnorrAccount:entrypoint | 2,304 | 9,944 | 2,852 (+1%) | 16,768 | 56,740 (-1%) | 2,097,152 | 457 | Token:privately_mint_private_note | 1,280 | 9,944 | 1,624 | N/A | N/A | N/A | N/A | FPC:fee_entrypoint_public | 1,344 | 9,944 | 362 (+2%) | 16,768 | 11,199 (-2%) | 524,288 | 457 | Token:transfer | 1,312 | 9,944 | 4,501 | 16,768 | 46,596 (+2%) | 2,097,152 | 457 | AuthRegistry:set_authorized (avm) | 21,043 | N/A | N/A | 87,552 | 1,396 (+1%) | N/A | N/A | FPC:prepare_fee (avm) | 28,495 | N/A | N/A | 88,448 | 5,773 (-1%) | N/A | N/A | Token:transfer_public (avm) | 44,885 | N/A | N/A | 88,170 | 4,031 (+1%) | N/A | N/A | AuthRegistry:consume (avm) | 34,973 | N/A | N/A | 87,968 | 2,991 | N/A | N/A | FPC:pay_refund (avm) | 41,394 | N/A | N/A | 89,248 | 23,402 (-2%) | N/A | N/A | Benchmarking:create_note | 1,344 | 9,944 | 1,406 (+1%) | N/A | N/A | N/A | N/A | SchnorrAccount:verify_private_authwit | 1,280 | 9,944 | 73.3 | N/A | N/A | N/A | N/A | Token:unshield | 1,376 | 9,944 | 3,694 | N/A | N/A | N/A | N/A | FPC:fee_entrypoint_private | 1,376 | 9,944 | 4,628 | N/A | N/A | N/A | N/A | ### AVM Simulation Time to simulate various public functions in the AVM. | Function | time_ms | bytecode_size_in_bytes | | - | - | - | GasToken:_increase_public_balance | 69.8 (+5%) | 13,873 | GasToken:set_portal | 16.9 (-1%) | 3,495 | Token:constructor | 95.5 (+3%) | 24,207 | FPC:constructor | 61.3 (-4%) | 13,893 | GasToken:mint_public | 56.1 (+7%) | 10,241 | Token:mint_public | :warning: 71.1 (**-88%**) | 19,216 | Token:assert_minter_and_mint | :warning: 231 (**+284%**) | 13,034 | AuthRegistry:set_authorized | 31.5 (-2%) | 7,869 | FPC:prepare_fee | :warning: 204 (**+33%**) | 15,129 | Token:transfer_public | :warning: 35.1 (**-30%**) | 31,425 | FPC:pay_refund | 184 (-11%) | 28,061 | Benchmarking:increment_balance | 2,732 | 15,407 | Token:_increase_public_balance | 56.1 | 15,089 | FPC:pay_refund_with_shielded_rebate | 198 (-2%) | 29,148 | ### Public DB Access Time to access various public DBs. | Function | time_ms | | - | - | get-nullifier-index | 0.154 (-2%) | ### Tree insertion stats The duration to insert a fixed batch of leaves into each tree type. | Metric | 1 leaves | 16 leaves | 64 leaves | 128 leaves | 256 leaves | 512 leaves | 1024 leaves | | - | - | - | - | - | - | - | - | batch_insert_into_append_only_tree_16_depth_ms | 10.3 | 16.7 | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_16_depth_hash_count | 16.8 | 31.7 | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_16_depth_hash_ms | 0.598 | 0.513 | N/A | N/A | N/A | N/A | N/A | batch_insert_into_append_only_tree_32_depth_ms | N/A | N/A | 48.4 | 75.7 | 131 | 245 | 468 | batch_insert_into_append_only_tree_32_depth_hash_count | N/A | N/A | 95.9 | 159 | 287 | 543 | 1,055 | batch_insert_into_append_only_tree_32_depth_hash_ms | N/A | N/A | 0.494 | 0.465 | 0.450 | 0.444 | 0.437 | batch_insert_into_indexed_tree_20_depth_ms | N/A | N/A | 59.6 (-1%) | 112 (+1%) | 182 (+1%) | 352 (-1%) | 690 | batch_insert_into_indexed_tree_20_depth_hash_count | N/A | N/A | 109 | 207 | 355 | 691 | 1,363 | batch_insert_into_indexed_tree_20_depth_hash_ms | N/A | N/A | 0.504 (-1%) | 0.501 | 0.482 | 0.477 (-1%) | 0.473 | batch_insert_into_indexed_tree_40_depth_ms | N/A | N/A | 72.7 | N/A | N/A | N/A | N/A | batch_insert_into_indexed_tree_40_depth_hash_count | N/A | N/A | 133 | N/A | N/A | N/A | N/A | batch_insert_into_indexed_tree_40_depth_hash_ms | N/A | N/A | 0.518 | N/A | N/A | N/A | N/A | ### Miscellaneous Transaction sizes based on how many contract classes are registered in the tx. | Metric | 0 registered classes | 1 registered classes | | - | - | - | tx_size_in_bytes | 85,672 | 670,983 | Transaction size based on fee payment method | Metric | | | - | |

AztecProtocol / aztec-packages

feat(avm): calldata gadget preliminaries #7227

Benchmark results