Targeted Fuzzing - Githubissues

plotchy commented 1 month ago

Since the EVM is stateful, and all interesting things onchain occur during a stateful operation, what if we targeted those during the fuzz run?

There is some research around targeting specific operations. In web2 this is around santizers placed by LLVM. A paper like FishFuzz demonstrates how effective this can be.

Since the EVM has very specific operations that can change state, I think tracking those and scheduling testcases to go after those operations (rather than coverage) would be useful.

Create control flow graph of runtime bytecode

Find all state changing operations

// we track all of the pc's that can be targets (anything that can change evm state)
pub const TARGET_OPS: [u8; 7] = [SSTORE, CALL, DELEGATECALL, CALLCODE, SELFDESTRUCT, CREATE, CREATE2];

Create a distance score between all nodes of the CFG. ie: (CFG_block_1, CFG_block_2, score)
When scheduling testcases, prioritize seeds that are closest to unreached targets (Exploring mode)
After a time threshold, switch phase to prioritize seeds that have reached targets (Exploiting mode)

Concerns

Ityfuzz can be used as a test-suite, where it attempts to find lines of code with assert or AssertionFailed("Bug"). By switching scheduling to prioritize state changing events, neither of these assert targets would be prioritized by the scheduler and they would take longer to find

shouc commented 1 month ago

Sounds interesting and promising! There seems to be no work on distance-based scheduling for stateful ops.

It would be relatively easy to implement in ItyFuzz as you can directly implement a new power schedule like: https://github.com/fuzzland/ityfuzz/blob/master/src/evm/scheduler.rs#L378-L411

plotchy commented 1 month ago

@shouc

At the moment I've been testing through modifying the PowerScheduler's next() fn to only select from testcases that I've chosen as closest.

I chose to do it this way as you can "prune" the corpus even as it grows. Since corpus is added every time coverage gets better, its gets to be pretty large (like BIGFI_exp.txt hits 300+ in a few mins [32m INFO[0m [Testcase #0] run time: 0h-1m-30s, clients: 1, corpus: 373, objectives: 0, executions: 269991, exec/sec: 3.015k).

Since next() always iterates from index 0 to last and loops back to 0, it wastes a lot of time on early indexes that are no longer best.

Do you think keeping next() as is and modifying compute() to run the close testcases more is better? I can see there being advantages to running old inputs every so often but would be worried about very large corpuses wasting time

https://github.com/fuzzland/ityfuzz/blob/abc46e3dddb5aceeb8b501b783ead3121240753b/src/evm/scheduler.rs#L299-L313

fuzzland / ityfuzz

Targeted Fuzzing #482

Concerns