cctl: add optional chainspec argument

koxu1996 commented 3 months ago

@marijanp I am okay with adding option to customize chainspec (even though discouraged), but this PR also replaces casper-node itself, which I don't understand. Could you split PR?

marijanp commented 3 months ago

@koxu1996 Both the changes on this PR/branch is something that @jonas089 needs for testing the verify contract endpoint

jonas089 commented 3 months ago

@marijanp I am okay with adding option to customize chainspec (even though discouraged), but this PR also replaces casper-node itself, which I don't understand. Could you split PR?

We need to be able to target a custom casper node branch with a host fn verifier for the risc0 proofs.

Currently that's cspr-rad/casper-node branch kairos-demo-1.5.6

There is no need to change the chainspec since that can be done directly on the branch. It's just important that I can spin up cctl with that branch.

marijanp commented 3 months ago

@jonas089 This PR overrides casper-node to use cspr-rad/casper-node/kairos-demo-1.5.6, there is no need to modify the chainspec in the cctl sources.

jonas089 commented 3 months ago

@jonas089 This PR overrides casper-node to use cspr-rad/casper-node/kairos-demo-1.5.6, there is no need to modify the chainspec in the cctl sources.

That's great 👍

jonas089 commented 3 months ago

@jonas089 This PR overrides casper-node to use cspr-rad/casper-node/kairos-demo-1.5.6, there is no need to modify the chainspec in the cctl sources.

But how can we switch back to the release branch on casper-node? Is the only way to change the url in the flake?

marijanp commented 3 months ago

Why and when do you want to change the casper-node sources again? Kairos will need this specific casper-node in order to function properly. So once your changes make it to upstream, we remove the override.

jonas089 commented 3 months ago

Why and when do you want to change the casper-node sources again? Kairos will need this specific casper-node in order to function properly. So once your changes make it to upstream, we remove the override.

For now, but not necessarily forever. Making it an option & environment variable would be more future proof.

I don't think anyone wants to change a url in a flake, it should live somewhere else.

marijanp commented 3 months ago

@jonas089 In what scenario are you going to run the kairos-stack withouth a casper-node that supports zk verification?

jonas089 commented 3 months ago

@jonas089 In what scenario are you going to run the kairos-stack withouth a casper-node that supports zk verification?

In a scenario where we use something other than risc0 or other than a host function or simply decide that we don't want to depend on this particular branch anymore.

Hardcoding this into a flake just seems unnecessarily complicated for any developer who wants to change the target node.

It's a pretty fundamental development practice when doing stuff like this on Casper so it should be straightforward to use.

koxu1996 commented 3 months ago

Currently, the only change in custom version of the casper-node is the introduction of a host function for Risc0 verification. This modification not only makes our solution incompatible with the production network, but also offers no significant benefit from the host function itself - it merely moves code from the smart contract to the node to reduce execution costs and replace it with a fixed fee in CSPR. However, in NCTL tests, we have unlimited tokens, so there is no reason to optimize it.

By the way, I have already worked on upstreaming host functions related to ZK, see the picture below:

This was the first step towards having optimized Risc0 verification on Casper. I then discovered that Risc0 uses sha256's internal hazmut for hashing, making it unlikely to be introduced as part of the generic hash helper. It is also unclear how to benchmark and assign a fixed cost to host functions, and in VM 2.0, they will become completely obsolete.

TL;DR: Introducing host functions is pointless.

jonas089 commented 3 months ago

Currently, the only change in custom version of the casper-node is the introduction of a host function for Risc0 verification. This modification not only makes our solution incompatible with the production network, but also offers no significant benefit from the host function itself - it merely moves code from the smart contract to the node to reduce execution costs and replace it with a fixed fee in CSPR. However, in NCTL tests, we have unlimited tokens, so there is no reason to optimize it.

We can use the verifier directly in the smart contract EP and not use a host function, but either way we will have to change the chainspec since the proof size for the demo will definitely exceed the maxium runtime argument size and potentially also the maximum memory.

In my opinion it's easiest to use a host function for the verifier so that we don't need to import risc0 stuff in the contract. But if you disagree then we can discuss this again.

@koxu1996

koxu1996 commented 3 months ago

@jonas089 I am pretty sure we can workaround those limits. What about embedding proof bytes in session code and then calling call_contract()?

jonas089 commented 3 months ago

@jonas089 I am pretty sure we can workaround those limits. What about embedding proof bytes in session code and then calling call_contract()?

There is a maximum size per session arg that we will likely exceed. Even if you embed the bytes you still need to pass them to the contract as a runtime arg.

jonas089 commented 3 months ago

My suggestion is that the chainspec should be changeable in our e2e tests. For development this will be useful and if we somehow find a way to make it work with the original chainspec then that is good, but I don't think we'll get that before the demo.

koxu1996 commented 3 months ago

In chainspec for v1.5.6, the maximum length of serialized session arguments is 1 KiB. From what I see in casper-node code, session argument size is validated for deployments, not internal calls. This allows us to bypass this limit.

Working example

I build simple contract with handle_big_payload() that takes bytes in data argument and prints its length:

#![no_main]
use casper_types::{bytesrepr::Bytes, CLType, EntryPoint, EntryPointAccess, EntryPointType, EntryPoints};
use casper_contract::contract_api::{runtime, storage};

#[no_mangle]
pub fn handle_big_payload() {
    let data: Bytes = casper_contract::contract_api::runtime::get_named_arg("data");
    runtime::print(&format!("Payload length: {}", data.len()))
}

#[no_mangle]
pub extern "C" fn call() {
    let mut entry_points = EntryPoints::new();
    entry_points.add_entry_point(EntryPoint::new(
        "handle_big_payload",
        Vec::new(),
        CLType::Unit,
        EntryPointAccess::Public,
        EntryPointType::Contract,
    ));

    storage::new_locked_contract(entry_points, None, Some("test-contract-hash".to_string()), None);
}

Corresponding session code that will pass large amount of data to it (they are directly included in compile time):

#![no_main]

use casper_contract::contract_api::runtime;
use casper_types::{bytesrepr::Bytes, runtime_args, ContractHash, RuntimeArgs};

const DATA: &[u8] = include_bytes!("./random_325K.bin");

#[no_mangle]
pub extern "C" fn call() {
    let contract_hash = ContractHash::from_formatted_str("contract-7f0c6e5b0fa05d1d93656cdf2ab7bb65a74f411f79ecc31edaf3c455b8e94067").unwrap();
    let bytes: Bytes = DATA.into();
    let runtime_args = runtime_args! {
        "data" => bytes,
    };
    runtime::call_contract(contract_hash, "handle_big_payload", runtime_args)
}

NOTE: Random data was generated with dd if=/dev/urandom of=random_325K.bin bs=1 count=325000.

Results

$ docker exec -it my-testnet bash -c "tail -f ~/casper-node/utils/nctl/assets/net-1/nodes/node-1/logs/stdout.log | grep -v timestamp"
Payload length: 325000

I run a couple of additional tests:

:heavy_check_mark: 300 kB - 27.688884880 CSPR :heavy_check_mark: 325 KB - 29.990384880 CSPR :heavy_check_mark: 340 KB - 27.473332420 CSPR :stop_sign: 350 kB - Interpreter error: trap: Code(Unreachable), 16.084973520 CSPR :stop_sign: 400 kB - ApiError::OutOfMemory [20], 13.793995250 CSPR :stop_sign: 700 kB - Interpreter error: trap: Code(Unreachable), 16.073696010 CSPR :stop_sign: 800 kB - ApiError::OutOfMemory [20], 18.364528100 CSPR

@jonas089 As you can see we can relatively easy bypass 1KiB limit and reach over 300k bytes for arguments. Is that sufficient for our proofs?

If not, then I have few other ideas that will work with production Casper, and allow us even bigger payloads :stuck_out_tongue:.

jonas089 commented 3 months ago

In chainspec for v1.5.6, the maximum length of serialized session arguments is 1 KiB. From what I see in casper-node code, session argument size is validated for deployments, not internal calls. This allows us to bypass this limit.

Working example

I build simple contract with handle_big_payload() that takes bytes in data argument and prints its length:
#![no_main]
use casper_types::{bytesrepr::Bytes, CLType, EntryPoint, EntryPointAccess, EntryPointType, EntryPoints};
use casper_contract::contract_api::{runtime, storage};

#[no_mangle]
pub fn handle_big_payload() {
    let data: Bytes = casper_contract::contract_api::runtime::get_named_arg("data");
    runtime::print(&format!("Payload length: {}", data.len()))
}

#[no_mangle]
pub extern "C" fn call() {
    let mut entry_points = EntryPoints::new();
    entry_points.add_entry_point(EntryPoint::new(
        "handle_big_payload",
        Vec::new(),
        CLType::Unit,
        EntryPointAccess::Public,
        EntryPointType::Contract,
    ));

    storage::new_locked_contract(entry_points, None, Some("test-contract-hash".to_string()), None);
}
Corresponding session code that will pass large amount of data to it (they are directly included in compile time):
#![no_main]

use casper_contract::contract_api::runtime;
use casper_types::{bytesrepr::Bytes, runtime_args, ContractHash, RuntimeArgs};

const DATA: &[u8] = include_bytes!("./random_325K.bin");

#[no_mangle]
pub extern "C" fn call() {
    let contract_hash = ContractHash::from_formatted_str("contract-7f0c6e5b0fa05d1d93656cdf2ab7bb65a74f411f79ecc31edaf3c455b8e94067").unwrap();
    let bytes: Bytes = DATA.into();
    let runtime_args = runtime_args! {
        "data" => bytes,
    };
    runtime::call_contract(contract_hash, "handle_big_payload", runtime_args)
}
NOTE: Random data was generated with dd if=/dev/urandom of=random_325K.bin bs=1 count=325000.

Results
$ docker exec -it my-testnet bash -c "tail -f ~/casper-node/utils/nctl/assets/net-1/nodes/node-1/logs/stdout.log | grep -v timestamp"
Payload length: 325000
I run a couple of additional tests:

:heavy_check_mark: 300 kB - 27.688884880 CSPR :heavy_check_mark: 325 KB - 29.990384880 CSPR :heavy_check_mark: 340 KB - 27.473332420 CSPR :stop_sign: 350 kB - Interpreter error: trap: Code(Unreachable), 16.084973520 CSPR :stop_sign: 400 kB - ApiError::OutOfMemory [20], 13.793995250 CSPR :stop_sign: 700 kB - Interpreter error: trap: Code(Unreachable), 16.073696010 CSPR :stop_sign: 800 kB - ApiError::OutOfMemory [20], 18.364528100 CSPR

@jonas089 As you can see we can relatively easy bypass 1KiB limit and reach over 300k bytes for arguments. Is that sufficient for our proofs?

If not, then I have few other ideas that will work with production Casper, and allow us even bigger payloads :stuck_out_tongue:.

300kb is not sufficient unless we use a groth16 wrapper. Also it obviously depends on how many transactions we have in one batch.

jonas089 commented 3 months ago

I don't see a reason not to be able to edit the chainspec for testing. It's a feature that's useful so I want to be able to use it when developing and testing.

jonas089 commented 3 months ago

We currently don't know the proof size for the demo since we haven't finished the tests for risc0 circuit. But it'll likely be a few Megabytes without groth16 even for a small amount of transactions.

Once we know the exact numbers we can discuss if it's doable with the production chainspec depending on how many transactions per batch we want to support for the Demo.

@koxu1996

koxu1996 commented 3 months ago

@jonas089 Okay, let's go with custom chainspec, and use run_with_chainspec(chainspec_path: &Path) from this PR. However, we should NOT change casper node itself.

cspr-rad / kairos