Recover state after the fuzzer exited with Signal 9 / 137 (OOM)

sitay1 commented 3 years ago

Is your feature request related to a problem? Please describe. My fuzzer is being killed with signal 9 very often (due to memory limits of the MacOS) Upon being terminated by signal 9 - the next instance of the fuzzer thinks it's the first time it ran. it doesn't able to restore the state of the previous run.

in the following if block the code is throwing a Panic since the state restore doesn't hold any content ..

if !staterestorer.has_content() {
                    #[cfg(unix)]
                    if child_status == 137 {
                        // Out of Memory, see https://tldp.org/LDP/abs/html/exitcodes.html
                        // and https://github.com/AFLplusplus/LibAFL/issues/32 for discussion.
                        panic!("Fuzzer-respawner: The fuzzed target crashed with an out of memory error! Fix your harness, or switch to another executor (for example, a forkserver).");
                    }

                    // Storing state in the last round did not work
                    panic!("Fuzzer-respawner: Storing state in crashed fuzzer instance did not work, no point to spawn the next client! (Child exited with: {})", child_status);
                }

I have added the following lines in order to survive this use-case and avoid the panic:

if child_status == 9 {
                    println!("Fuzzer killed with Signal {:?}, probably reached memory limit...let's try to run a new instance of the fuzzer", child_status);
                }
                else ...

but then the fuzzer thinks it's the first time it's running, it doesn't able to get anything from the state restorer.

This doesn't happen when a normal timeout/crash is happening -- becuase (i think) it's calling

event_mgr.on_restart(state).unwrap();

which saves the state into the staterestorer.

Can this call be called also in case the child exited with signal 9 ? or is it too late for that to happen?

Describe the solution you'd like I would like the fuzzing to be able to restart from the same point -- and not starting from scratch.

Describe alternatives you've considered I have tried to catch the signal and not calling panic - but then the state restore starts a new fuzzing session loading the initial corpus..instead of continuing from the point it currently reached.

It does work in cases of timeout/crashes that the signals can catch...and then the state is being restored before the process is exiting.

Not sure if that solution is viable to the use-case where the processes already exited with signal 9.

Additional context The limit of the memory is being specially enforced in order to avoid kernel panics by the Mac -- when it's memory is being run out.

domenukk commented 3 years ago

Hey, yes sadly there is no way to store the state in an oom event (unless we would store it every Iteration which is super slow). The process gets killed without any previous notice.

What you could do is, instead of looping forever in the fuzzer, to only loop for a number of iterations, then exit (cleanly) and refork. That's probably a good and clean solution. Another alternative is to use a forkserver executor to execute each testcase in a newly formed child (old skool AFL style) which is also somewhat supported

It would be best if there was a way to get rid of the memory leak in the target, though, of course.

Hope this gives you some pointers.

Edit: here is an example fuzzer using fuzz_loop_for: https://github.com/AFLplusplus/LibAFL/blob/6ae36ce6584df8260d1728148ce4c158d207ead3/fuzzers/libfuzzer_libpng/src/lib.rs#L183 We could consider adding support to the launcher, also, I don't think it's possible with it, atm

domenukk commented 3 years ago

Another idea would be to store snapshots of the state and then in an oom event fall back to those, instead of completely restarting the fuzzer. This could actually be a good trade-off. @andreafioraldi what do you think?

sitay1 commented 3 years ago

I’m already using run_loop_for but that’s not enough. (And I don’t want it to shut down every small number of iterations for no reasons)

Why is the state not being saved in the client level ? And no the fuzzer level? Doesn’t the client also have access to the shared memory as well?

The state object is only relevant in the fuzzer scope?

Sent from my iPhone

On 3 Sep 2021, at 3:12, Dominik Maier @.***> wrote:

Hey, yes sadly there is no way to store the state in an oom event (unless we would store it every Iteration which is super slow). The process gets killed without any previous notice.

What you could do is, instead of looping forever in the fuzzer, to only loop for a number of iterations, then exit (cleanly) and refork. That's probably a good and clean solution. Another alternative is to use a forkserver executor to execute each testcase in a newly formed child (old skool AFL style) which is also somewhat supported

It would be best if there was a way to get rid of the memory leak in the target, though, of course.

Hope this gives you some pointers.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

domenukk commented 3 years ago

The state doesn't live on the shared map itself, but it's serialized to the shared map inside the crash handler. This was a deliberate decision, since else we could never use any pointers inside the state. The downside is that we have the additional overhead of serialization at crash, and in a case where we can't handle the crash (oom), there's not much we can do. So the options are to either save checkpointed states and fall back to the last checkpoint on oom, or use a traditional afl-style forkserver. In a forkserver scenario, the state lives outside of the target, so we can act on oom, but we do need IPC. This, too, is additional overhead and slows down fuzzing.

Third idea would be to monitor mem pressure and simply restart before the operating system kills us

domenukk commented 3 years ago

Aktually, we can probably make up a good metric when to snapshot the state, if we also track mutable accesses to it that are not the rng, then reseed the rng on restarts. If you have a very leaky target that crashes in between snapshots you can still get stuck, but at some point you'll just have to fix your harness...

sitay1 commented 3 years ago

Yea I will try to turn the HARD SIGNAL Into a SOFT Signal and simulate a crash that way - so I can restore the state in the crash handler like in the regular scenario.

Sent from my iPhone

On 3 Sep 2021, at 10:32, Dominik Maier @.***> wrote:

The state doesn't live on the shared map itself, but it's serialized to the shared map inside the crash handler. This was a deliberate decision, since else we could never use any pointers inside the state. The downside is that we have the additional overhead of serialization at crash, and in a case where we can't handle the crash (oom), there's not much we can do. So the options are to either save checkpointed states and fall back to the last checkpoint on oom, or use a traditional afl-style forkserver. In a forkserver scenario, the state lives outside of the target, so we can act on oom, but we do need IPC.

Third idea would be to monitor mem pressure and simply restart before the operating system kills us

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

domenukk commented 1 year ago

Closing this issue for inactivity. Reopen if there's anything we can fix.

AFLplusplus / LibAFL

Recover state after the fuzzer exited with Signal 9 / 137 (OOM) #283