CUDA run `Killed` during single-prove

I have a program which runs out of memory when running on a CPU (AMD EPYC 7R32, with 64GB RAM), and crashes in the Start: advice step when running with --features cuda

Source code:

use std::convert::TryInto;

use rs_merkle::algorithms::Sha256;
use rs_merkle::{Hasher, MerkleProof};
use wasm_bindgen::prelude::*;

// root and proof for the following tree:
// ["a", "b", "c", "d"]
const ROOT: &str = "14ede5e8e97ad9372327728f5099b95604a39593cac3bd38a343ad76205213e7";
const PROOF: &str="2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6e5a01fee14e0ed5c48714f22180f25ad8365b53f9779f79dc4a3d7e93963f94a";

// src/lib.rs
#[wasm_bindgen]
pub fn merkle() -> i64 {
    let leaf_values = ["a", "b", "c", "d"];
    let leaves: Vec<[u8; 32]> = leaf_values
        .iter()
        .map(|x| Sha256::hash(x.as_bytes()))
        .collect();

    // decode root
    // this would be internal state
    let root: [u8; 32] = hex::decode(ROOT).unwrap().try_into().unwrap();

    // leaves to decode
    // this would be an input
    let indices_to_prove = vec![3];
    let leaves_to_prove = leaves.get(3..).ok_or("can't").unwrap();

    // decode proof
    // this would be an input
    let proof_bytes = hex::decode(PROOF).unwrap();
    let proof = MerkleProof::<Sha256>::from_bytes(proof_bytes.as_slice()).unwrap();

    assert!(proof.verify(root, &indices_to_prove, leaves_to_prove, leaves.len()));
    0
}

I'm using rs_merkle from my personal fork because the original version relied on a floating point lib, which I had to remove

This is the full output (setup + single-prove):

wasm-pack build --release
[INFO]: Checking for the Wasm target...
[INFO]: Compiling to Wasm...
    Finished release [optimized] target(s) in 0.01s
[WARN]: :-) origin crate has no README
[INFO]: Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: Optional fields missing from Cargo.toml: 'description', 'repository', and 'license'. These are not necessary, but recommend
ed
[INFO]: :-) Done in 1.23s
[INFO]: :-) Your wasm pkg is ready to publish at /home/ubuntu/odsy/zkwasm-hello-world/merkle/pkg.
just setup
rm -rf output
mkdir -p output
zkwasm-cli-x86 -k 22 --function merkle --output ./output --wasm pkg/zkwasm_merkle_bg.wasm setup
write params K=22 to "./output/K22.params"
quotient_poly_degree 8
write vkey to "./output/zkwasm.0.vkey.data"
write params K=23 to "./output/K23.params"
just prove
zkwasm-cli-x86 -k 22 --function merkle --output ./output --wasm pkg/zkwasm_merkle_bg.wasm single-prove
read params K=22 from "./output/K22.params"
read vkey from "./output/zkwasm.0.vkey.data"
quotient_poly_degree 8
Start:   generate pkey
End:     generate pkey .............................................................308.431s
Start:   create proof
··Start:   instance
··End:     instance ................................................................7.723s
··Start:   advice
Killed
error: Recipe `prove` failed on line 17 with exit code 137

And this is the live stats from nvidia-smi when the crash happens: Last column is memory usage (only 250MB out of 24GB used):

timestamp, name, pci.bus_id, driver_version, pstate, pcie.link.gen.max, pcie.link.gen.current, temperature.gpu, utilization.gpu [%]
, utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]
2023/01/27 14:12:19.780, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P0, 4, 4, 34, 0 %, 0 %, 23028 MiB, 22344 MiB, 247 MiB
2023/01/27 14:12:20.787, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P0, 4, 4, 34, 0 %, 0 %, 23028 MiB, 22344 MiB, 247 MiB
2023/01/27 14:12:21.801, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P0, 4, 4, 34, 0 %, 0 %, 23028 MiB, 22344 MiB, 247 MiB
2023/01/27 14:12:22.802, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P0, 4, 4, 34, 0 %, 0 %, 23028 MiB, 22592 MiB, 0 MiB
2023/01/27 14:12:23.803, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P0, 4, 4, 34, 0 %, 0 %, 23028 MiB, 22592 MiB, 0 MiB
2023/01/27 14:12:24.804, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P8, 4, 1, 33, 0 %, 0 %, 23028 MiB, 22592 MiB, 0 MiB
2023/01/27 14:12:25.805, NVIDIA A10G, 00000000:00:1E.0, 515.65.01, P8, 4, 1, 33, 0 %, 0 %, 23028 MiB, 22592 MiB, 0 MiB

PS: I tried running with RUST_BACKTRACE=full but got no additional output

DelphinusLab / zkWasm

CUDA run `Killed` during single-prove #99