0xPolygonZero / zk_evm

Apache License 2.0
85 stars 37 forks source link

`cargo test --release --package evm_arithmetization` fails #664

Closed 0xaatif closed 1 month ago

0xaatif commented 1 month ago

I first came across this on https://github.com/0xPolygonZero/zk_evm/actions/runs/11043011303/job/30676493011?pr=659

2024-09-26T00:16:10.2596402Z ---- cpu::kernel::tests::transient_storage::test_revert stdout ----
2024-09-26T00:16:10.2597958Z thread 'cpu::kernel::tests::transient_storage::test_revert' panicked at evm_arithmetization/src/witness/transition.rs:311:5:
2024-09-26T00:16:10.2599045Z Kernel PC is out of range: 65782

I've checked that this repros on master:

$ git show --pretty=reference --no-patch
56e7d08e (feat: trie diff tool (#630), 2024-09-24)
$ cargo test --release --lib --package evm_arithmetization
...
[2024-09-26T01:07:58Z DEBUG evm_arithmetization::cpu::kernel::interpreter] Cycle 20998, ctx=0, pc=addmul_bignum, instruction=Dup(0), stack=[12, 107, 0, 4951760157141521099596496895, 1207, 1, 30064771072, 12, 0, 23]
test cpu::kernel::tests::bignum::test_modmul_bignum_all ... ok

failures:

---- cpu::kernel::tests::transient_storage::test_revert stdout ----
thread 'cpu::kernel::tests::transient_storage::test_revert' panicked at evm_arithmetization/src/witness/transition.rs:311:5:
Kernel PC is out of range: 65782

failures:
    cpu::kernel::tests::transient_storage::test_revert

test result: FAILED. 224 passed; 1 failed; 6 ignored; 0 measured; 0 filtered out; finished in 714.93s

error: test failed, to rerun pass `-p evm_arithmetization --lib`
$ cargo test --release --lib --package evm_arithmetization -- cpu::kernel::tests::transient_storage::test_revert
    Finished `release` profile [optimized] target(s) in 0.26s
     Running unittests src/lib.rs (target/release/deps/evm_arithmetization-2e580b019f300f5a)

running 1 test
test cpu::kernel::tests::transient_storage::test_revert ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 230 filtered out; finished in 5.16s

Note that the test fails when run as part of the suite, but not in isolation, suggesting that our tests are polluting one another. Also note that we pass weird compiler options when we run CI: https://github.com/0xPolygonZero/zk_evm/blob/56e7d08eb243020e82e228693ee5f8c42d86901e/.github/workflows/ci.yml#L94

Not entirely sure what is going on here, but it's a smell

Nashtare commented 1 month ago

RUSTFLAGS: -Copt-level=3 -Cdebug-assertions -Coverflow-checks=y -Cdebuginfo=0

These are basically to run in a DEBUG like profile with closer to RELEASE performances. Especially debug-assertions include additional checks on the proving system (starky) to ensure all our constraints are consistent when proving statements.

As for the error:

Kernel PC is out of range: 65782

This doesn't happen in regular release mode. While I don't quite get why this happens with these particular flags, the issue is because for the transient storage test, we define a new static KERNEL:

https://github.com/0xPolygonZero/zk_evm/blob/56e7d08eb243020e82e228693ee5f8c42d86901e/evm_arithmetization/src/cpu/kernel/tests/transient_storage.rs#L226-L235

Which contains an additional assembly file (checkpoint_label.asm) causing the total (new) KERNEL code length to be 65852 instead of the regular length of 65782. I think the issue lies in the identical naming of 2 distinct static variables, which surprisingly doesn't trigger the error in regular release mode.