Make havoc_mutations Available for Custom Inputs

riesentoaster commented 6 months ago

I have a custom Input type I'm doing fuzzing with. It contains two fields in the shape of a Vec<u8>. I'd love to be able to use havoc mutations on both separately, in addition to further custom mutators. Currently, all havoc mutators rely on the Input implementing HasBytesVec, which means that I can't apply them to two parts separately.

I'd love to have, next to the now default havoc_mutations(), a method which takes a closure mapping between the Input type and the (mutable) bytes slice used by the mutators. havoc_mutations() could then just restrict its Inputs to HasBytesVec and call the new method with the extraction methods defined in its trait.

tokatoka commented 6 months ago

can you do this by, 1) have two Mutations; Mutation A for mutating the first part of your input, and Mutation B for mutating the second part of your input. 2) then define your own mutations that returns tuple_list!(MutationA::new(), MutationB::new())

riesentoaster commented 6 months ago

Theoretically yes but practically no. I would like to have the system mutate both parts of the input in all the ways the havoc mutators would. So yes, I can define two mutators, one for each part, but they'd have to wrap the logic of all the havoc mutators. This would lead to lots of code duplication, since I'd essentially have to copy all the havoc mutators' logic into the mutate function of these two new mutators and select randomly which one to use.

Plus it may be better to have one large list of mutators for the mutation scheduler to work better instead of having only two which consist of randomness opaque to the mutation scheduler.

tokatoka commented 6 months ago

ok. i understand. but maybe the thing to change is not the havoc_mutation. because havoc_mutations is just a function to return the tuples of all the predefined list of mutators.

i think what you want is a mutator that takes closure to extract ranges (or some parts of the input), and do the mutations. then you put plug this mutator with the existing havoc_mutation

riesentoaster commented 6 months ago

then you put plug this mutator with the existing havoc_mutation

That's exactly the issue. How would I do this? The existing havoc_mutations don't seem to have an interface for dynamic insertion of data.

tokatoka commented 6 months ago

i still don't get what you want to achieve. why do you need to dynamically insert into havoc_mutations to do this?

riesentoaster commented 6 months ago

Alright, let me start from the beginning. I have an struct that looks as follows (with only two parts for simplicity's sake, but this should work with n parts in the end):

struct MyComplexCustomInput {
    part1: Vec<u8>,
    part2: Vec<u8>,
    ...
}

I want to end up with a system where the operations defined in the havoc_mutations are done on part1 and part2. havoc_mutations extracts the byte slice from the input with the methods defined in this trait:

/// Contains an internal bytes Vector
pub trait HasBytesVec {
    /// The internal bytes map
    fn bytes(&self) -> &[u8];
    /// The internal bytes map (as mutable borrow)
    fn bytes_mut(&mut self) -> &mut Vec<u8>;
}

This is no issue if my Input has only one Vec<u8>, since I can just write an implementation as follows:

struct MySimpleCustomInput {
    part1: Vec<u8>,
}

impl HasBytesVec for MySimpleCustomInput {
    fn bytes(&self) -> &[u8] {
        &self.part1
    }

    fn bytes_mut(&mut self) -> &mut Vec<u8> {
        &mut self.part1
    }
}

...
let mut stages = tuple_list!(StdMutationalStage::new(StdScheduledMutator::new(havoc_mutations())));
...

This works. It mutates part1 as you'd expect. How would I implement this with MyComplexCustomInput?

My proposal would be to add something like this:

fn havoc_mutations_on_input_part<I>(extractor: FnMut(I) -> &mut Vec<u8>) -> HavocMutationsType<I> {
    BitFlipMutator::new(extractor),
    ByteFlipMutator::new(extractor),
    ...
}

fn havoc_mutations<I>() -> HavocMutationsType<I> where I: HasBytesVec {
    havoc_mutations_on_input_part(|input: I| input.bytes_mut())
}

let mut stages = tuple_list!(StdMutationalStage::new(StdScheduledMutator::new(
    havoc_mutations_on_input_part(|input| &mut input.part1)
        .merge(havoc_mutations_on_input_part(|input| &mut input.part2))
)));

// stages now contains mutators to mutate part1 and part2 in all the ways havoc_mutations would

I might just fundamentally miss something. If there is a simpler solution, please let me know!

tokatoka commented 6 months ago

so what you actually wants to do is to make each of the Mutation mutate only user-specified parts of the entire bytesvec?

tokatoka commented 6 months ago

first, there are 3 things

Mutation: is the each mutation operations, like byteflip, byte increment ...
Mutator: is the module that bundles the mutations, here you receive the input, and you define how, to apply the Mutation operator. This module can take a tuple of Mutations that you can use
havoc_mutation is this just the pre-defined set of Mutations operator same to afl. but you issue has nothing to do with havoc_mutations itself.

now, can you not just extract the parts you want in the Mutator, make it as the Input, then pass it to Mutation then copy it back? then this way it won't make the whole input

tokatoka commented 6 months ago

For example; https://github.com/AFLplusplus/LibAFL/blob/main/libafl/src/mutators/scheduled.rs#L100 this scheduled_mutate function is where these mutations are actually used.

can you do like this to implement your goal?

        let mut r = MutationResult::Skipped;
        let num = self.iterations(state, input);
        for _ in 0..num {
            let idx = self.schedule(state, input);
            let extracted = self.extract(input); // extract part 1 or part 2 of your input
            let outcome = self.mutations_mut().get_and_mutate(idx, state, extracted )?;
            input.copy_back(extracted) // copy the mutated part back.
            if outcome == MutationResult::Mutated {
                r = MutationResult::Mutated;
            }
        }

riesentoaster commented 6 months ago

I want to find a solution that is transparent to the wrapping mutation scheduler, so that it has full control over all mutators on all parts of the input (i.e. no further random selection opaque to the main scheduler).

I've got an idea, will report back once I know more. Thank you for your ideas so far, they really helped!

riesentoaster commented 6 months ago

Alright, I've implemented what I think you suggested:

use std::borrow::Cow;

use libafl::{
    inputs::{BytesInput, HasBytesVec, Input, UsesInput},
    mutators::{MutationResult, Mutator},
    Error,
};
use libafl_bolts::Named;

pub struct MappingMutator<IO, S> {
    extractor: for<'a> fn(&'a mut IO) -> &'a mut Vec<u8>,
    injector: for<'b> fn(&'b mut IO, &'b [u8]),
    inner: Box<dyn Mutator<BytesInput, S>>,
}

impl<IO, S> MappingMutator<IO, S> {
    pub fn new(
        extractor: for<'a> fn(&'a mut IO) -> &'a mut Vec<u8>,
        injector: for<'b> fn(&'b mut IO, &'b [u8]),
        inner: Box<dyn Mutator<BytesInput, S>>,
    ) -> Self {
        Self {
            extractor,
            injector,
            inner,
        }
    }
}

impl<S, IO> Mutator<IO, S> for MappingMutator<IO, S>
where
    S: UsesInput<Input = IO>,
    IO: Input,
{
    fn mutate(&mut self, state: &mut S, input: &mut IO) -> Result<MutationResult, Error> {
        let extracted = (self.extractor)(input).clone();
        let mut mapped_input = BytesInput::new(extracted);
        let res = self.inner.mutate(state, &mut mapped_input);
        (self.injector)(input, mapped_input.bytes());
        res
    }
}

impl<IO, S> Named for MappingMutator<IO, S> {
    fn name(&self) -> &Cow<'static, str> {
        &Cow::Borrowed("MappingMutator")
    }
}

Used like this:

let mut stages = tuple_list!(StdMutationalStage::new(StdScheduledMutator::new(
    tuple_list!(MappingMutator::new(
        |input: &mut Base64Input| &mut input.raw_data,
        |input, bytes| { input.raw_data = bytes.to_vec() },
        Box::new(BitFlipMutator::new())
    ))
)));

However, I really don't like the copying the vector back and forth. But I also couldn't get this to work with just passing around references, no matter how I implemented a custom Input type to use instead of the default BytesInput, which needs an owned vec.

So:

Is this what you suggested?
Would you know if (and if yes: how) it is possible to implement something like this without the copying?
How would I loop over the tuple defined in havoc_mutations to map MappingMutators over each, without having to do it manually?

tokatoka commented 6 months ago

Is this what you suggested?

yes

Would you know if (and if yes: how) it is possible to implement something like this without the copying?

you can make each of the Mutation take a range to select the stuff from, and then apply the mutation. but i wouldn't make it default.

How would I loop over the tuple defined in havoc_mutations to map MappingMutators over each, without having to do it manually?

i think you have to do it manually. (i'd ask chatgpt4/claude to write it for me)

@addisoncrump you have an idea?

tokatoka commented 6 months ago

also @domenukk

tokatoka commented 6 months ago

these Mutation are kept simple and do the minimal things for a reason. a testcase is scheduled to mutate to ~1000 times each round. then one mutating involves ~128 stacked mutations.

the code in Mutation is hot path. if you want to apply it for your own need, you should copy paste all and edit it. i don't want to make these take extra parameters to do complicate stuff. (such as, selecting the ranges to mutate.. etc)

domenukk commented 5 months ago

For the copying part, I'm working on a BytesSubMutator that will allow us to use mutations on part of the input. For the rest I don't have good answers atm :)

tokatoka commented 5 months ago

you mean BytesSubMutation ?

domenukk commented 5 months ago

BytesSubInput, sorry. It can be used with any input then

domenukk commented 5 months ago

See #2220 for BytesSubMutator that will allow this without extra copies

riesentoaster commented 5 months ago

Alright, I've had another play around. And I'm pretty sure what I'm trying to do is still not possible without reimplementing all mutators in havoc_mutations. I cannot reasonably implement HasMutatorBytes

for either the entire Input, to have havoc_mutations run on it directly, because the input consists of multiple parts and concatenating the parts to operate on is inefficient because it involves copying data around on each access
for each part, and then use a MultipartInput, since not all parts actually have content and there is no way to implement it on content-less parts without keeping a dummy Vec<u8> for those, which unnecessarily occupies memory and results in mutations that do nothing.

I'm still thinking that a wrapper mutator, which extracts the specified part of the Input and passes it to the inner mutator might be the only reasonable way to solve this issue. For this, I'd need something similar to BytesSubInput, but without the requirement of HasMutatorBytes on the outer Input, instead taking a mapping closure.

Am I still missing something?

domenukk commented 5 months ago

I don't fully understand why you want a mapping closure? Just implement the trait for your input - I mean, you need your input to implement HasMutatorBytes to do havoc mutations, anyway - at least the ones that shrink or grow the input(?)

&mut Vec.into() already implements HasMutatorBytes, for example, so if you have a vec you can mutate it (and sub vecs with .sub_input(). You will need to keep track of the inputs, of course, so you'll need to write your own code, similar to MultipartInput.

Other option would be to add the option to skip parts of the input to MultipartInput mutators?

I feel like an example would help (IIRC there was an example in the PR?)

riesentoaster commented 5 months ago

I'm sorry for the confusion. I'll try to summarise my requirements for this custom Input type again.

Conceptually, I'm doing differential fuzzing on two implementations of the same command line utility (in my case GNU coreutils vs. uutils coreutils). I want to ensure they behave the same, no matter what combination of command line arguments they are presented with. Manual testing could look something like this (from #2220):

echo "This is a stdin byte array"           | binary_under_test --arg1_with_param "value of param for arg1" --arg2_without_param --arg3_without_param --arg4_with_param "some value"
echo "This is another byte array for stdin" | binary_under_test --arg1_with_param "mutated value of param for arg1"
echo "More stdin"                           | binary_under_test --arg2_without_param --arg3_without_param

I have a few types of parts that I need to have between 0 and n, depending on the specific program I'm testing (examples on base64):

Data, which is always present (think the input for base64), which I think I'd model as a Vec<u8>
Optional arguments with no associated data (think the flag for --help)
Optional arguments with associated data (think --wrap=COLS), which I'd model either as a Option<Vec<u8>> or a struct OptionalArg { data: Vec<u8>, active: bool }

I would like to mutate this custom Input type in a few ways:

Toggle optional arguments, likely with a short custom Mutator, that's not the problem
Mutate the data in types 1 and 3, independently, and using all mutations defined in havoc_mutations. This is the thing I'm not sure how to accomplish.

riesentoaster commented 5 months ago

&mut Vec.into() already implements HasMutatorBytes

Hey, that's what I needed!

What I've currently got:

use std::borrow::Cow;

use libafl::{
    inputs::{MutVecInput, UsesInput},
    mutators::{MutationResult, Mutator},
    Error,
};
use libafl_bolts::Named;

pub struct MappingMutator<S, M>
where
    S: UsesInput,
    for<'a> M: Mutator<MutVecInput<'a>, S>,
{
    extractor: for<'a> fn(&'a mut S::Input) -> &'a mut Vec<u8>,
    inner: M,
}

impl<S, M> MappingMutator<S, M>
where
    S: UsesInput,
    for<'a> M: Mutator<MutVecInput<'a>, S>,
{
    pub fn new(extractor: for<'a> fn(&'a mut S::Input) -> &'a mut Vec<u8>, inner: M) -> Self {
        Self { extractor, inner }
    }
}

impl<S, M> Mutator<S::Input, S> for MappingMutator<S, M>
where
    S: UsesInput,
    for<'a> M: Mutator<MutVecInput<'a>, S>,
{
    fn mutate(&mut self, state: &mut S, input: &mut S::Input) -> Result<MutationResult, Error> {
        let mut mut_vec_input: MutVecInput = (self.extractor)(input).into();
        self.inner.mutate(state, &mut mut_vec_input)
    }
}

impl<S, M> Named for MappingMutator<S, M>
where
    S: UsesInput,
    for<'a> M: Mutator<MutVecInput<'a>, S>,
{
    fn name(&self) -> &Cow<'static, str> {
        &Cow::Borrowed("MappingMutator")
    }
}

which is then used like this:

        let mut stages = tuple_list!(StdMutationalStage::new(StdScheduledMutator::new(
            tuple_list!(
                MappingMutator::new(
                    |i: &mut Base64Input| i.extract_stdin(),
                    BitFlipMutator::new()
                ),
                Base64FlipDecodeMutator,
                Base64FlipIgnoreGarbageMutator,
                Base64WrapContentMutator,
                Base64FlipWrapMutator,
            )
        )));

The only thing I need now is a way to map this new MappingMutator over all Mutators in havoc_mutations (which is a tuple_list). Is there a way to do that without defining the entire list by hand?

riesentoaster commented 5 months ago

(Even better, it's very easy to write something similar for optional args, as in where the extractor returns an Option<Vec<u8>>)

riesentoaster commented 5 months ago

I have just copy-pasted the entire list of havoc_mutations in my project for now. There is an issue with CrossoverInsertMutator and CrossoverReplaceMutator, since they expect the Input in the State to implement HasMutatorBytes, which is no longer the case if I'm mapping the inputs in my custom Mutators. That makes sense, they're just loading an Input from the State to insert/replace random parts from. I don't think there's a solution without changes to the implementations.

domenukk commented 5 months ago

Happy you're getting somewhere! :)

Yes, splicing on your custom type cannot "just work", you could for example wrap these mutators with your own that grabs a parameter from the other input and just splices that part (or something). Or just levae them all. But all in all, splicing makes a lot of sense / finds new things

/edit: BTW: you can do havoc_mutations.into_vec() to get a vector that you can then use for whatever you want, maybe that helps?

riesentoaster commented 5 months ago

I will try to at some point create a PR with all this logic, with an interface which should make it very easy to mutate byte vector parts of custom inputs. I'd maybe also include a baby fuzzer example with a simple custom Input, to explain what is happening. I think I'll just reimplement the two missing Mutators, this way I'll at least only have to duplicate two of the however many in havoc_mutations(). But first, I've got to write a report before the semester ends. #studentlife

Would you be interested in such a PR?

domenukk commented 5 months ago

Definitely sounds useful! The only question that remains (probably for @addisoncrump ) is if this cannot already be done with Multipart inputs, it still feels very closely related..

riesentoaster commented 5 months ago

@addisoncrump If you can't be bothered to read the entire thread, here's the gist: Requirements and Problem

addisoncrump commented 5 months ago

It is definitely possible to perform a mapping operation as you describe across a tuple. I'll implement the machinery for this real quick.

You will need to implement a custom version of crossover. That's just gotta happen, sorry!

Multipart is almost equipped to handle this, except for the no argument case. To handle this, I propose the following PR:

Implement a trait called MutatorMapper which accepts an input and returns a Option<Self::Output>, where Output is a associated type on MutatorMapper. If the return is None, we issue MutationResult::Skipped to allow for fallible transformations.
Blanket impl MutatorMapper for Fn(T) -> Option<U>.

Then, we can implement what you describe here very easily with MultipartInput<ArgComponent> where ArgComponent is:

enum ArgComponent {
  Arg(Arg),
  Stdin(Vec<u8>)
}

enum Arg {
  Help(Option<Vec<u8>>),
  ...
}

And then the mapper would simply transform the stdin or Arg to a mutable slice as needed.

riesentoaster commented 5 months ago

It is definitely possible to perform a mapping operation as you describe across a tuple. I'll implement the machinery for this real quick.

Awesome, thank you!

You will need to implement a custom version of crossover. That's just gotta happen, sorry!

Fair enough.

For your proposal: I'm not sure where HasMutatorBytes would come in, since this is what the mutators in havoc_mutations() relies on?

I'm also not sure if the approach outlined here might not be better (at least it would certainly be more flexible, and I don't think it would be any less performant). This would not use MultipartInput, but would allow you to combine mutators for arbitrary Input types:

let mut stages = tuple_list!(StdMutationalStage::new(StdScheduledMutator::new(
    havoc_mutations_mapped(|i: MyCustomInput| -> Vec<u8> {i.extract_required_input_part()})
    .extend(havoc_mutations_mapped_optional(|i: MyCustomInput| -> Option<Vec<u8>> {i.extract_optional_input_part()})
    .extend(havoc_mutations_mapped_optional(MyCustomInput::exteract_optional_input_part2) // more concise
    .append(MyCustomInputToggleOptionalInputPartMutator::new()) // or any other futher mutators
)));

addisoncrump commented 5 months ago

Here's the tuple mapping trait: https://github.com/AFLplusplus/LibAFL/pull/2247

addisoncrump commented 5 months ago

For your proposal: I'm not sure where HasMutatorBytes would come in, since this is what the mutators in havoc_mutations() relies on?

You can directly return the Vec<u8> (or the wrapping type, don't remember the name presently) by extracting it from the relevant component. The specifics of your implementation are of course up to you :p We're simply recommending Multipart as we think what you're describing can be made from existing components, but if you have a solution that works then use this.

domenukk commented 5 months ago

Can this issue be closed?

riesentoaster commented 5 months ago

I'll try to finish the implementation properly of this and create a PR, but it'll be a few more weeks, so maybe leave it open for now?

AFLplusplus / LibAFL

Make havoc_mutations Available for Custom Inputs #2202