proposal: testing: custom mutator support for fuzzing

s3nt3 commented 2 years ago

As the official fuzzer implementation provided by golang, the native fuzzer should be well suited for various usage scenarios. However, currently native fuzzers only support general mutation algorithms for built-in types. Therefore, in many cases, the native fuzzer cannot efficiently generate test inputs. For example, when testing a DSL parser, the mutator will generate a large amount of output that cannot pass the syntax or semantic check. A possible solution is to provide support for custom mutators, so that users can implement custom mutators for various fuzz targets and reuse other parts of the native fuzzer.

I tried on the existing code and designed the following interface:

 type CustomMutator interface {
     Marshal() ([]byte, error)
     Unmarshal([]byte) error
     Mutate() error
 }

The object that implements the above interface can be passed as an argument to the testing.F.Fuzz method, and it will call the Mutate method to use the custom mutation algorithm. The Marshal and Unmarshal methods ensure that it can be imported/exported to a corpus file.

I think supporting custom mutator will bring the following benefits:

the usage scenarios of native fuzzer are expanded
it is more convenient for the community to test and optimize mutation algorithms
the interface of custom mutator is very similar to the struct mutation interface mentioned in the draft proposal, so implementing a custom mutator may be a way to support struct mutation

In addition, custom mutator will also bring some side effects, such as the custom mutator code will be instrumented, which may affect performance and the accuracy of coverage statistics.

/cc @jayconrod

jayconrod commented 2 years ago

cc @golang/fuzzing

s3nt3 commented 2 years ago

A friendly ping: any update?

rolandshoemaker commented 2 years ago

This is unlikely to be implemented for 1.18, but might be considered for 1.19.

s3nt3 commented 2 years ago

Thanks for your reply, I will maintain an implementation in my local branch(https://github.com/s3nt3/go/tree/dev.fuzz.custom_mutator). Looking forward to supporting this feature in the new development cycle.

icholy commented 2 years ago

The interface should probably satisfy encoding.{TextMarshaler,TextUnmarshaler}.

 type CustomMutator interface {
     Mutate() error
     MarshalText() ([]byte, error)
     UnmarshalText([]byte) error
 }

nevkontakte commented 2 years ago

I believe Go 1.20 development cycle is currently open, and implementation in https://github.com/s3nt3/go/tree/dev.fuzz.custom_mutator looks like a very compact change. Any chance of this being included in 1.20? I could definitely use this capability in some of my projects :)

rhansen commented 1 year ago

Additional background: Structure-Aware Fuzzing with libFuzzer

rhansen commented 1 year ago

The interface should probably satisfy encoding.{TextMarshaler,TextUnmarshaler}.

Users might be fuzzing binary data such as images or protobufs. Wouldn't binary marshaler/unmarshaler be a better choice?

type CustomMutator interface {
    encoding.BinaryMarshaler
    encoding.BinaryUnmarshaler
    Mutate() error
}

Also, would it be useful to supply an int64 seed, rand.Source, or rand.Rand to help with reproducibility? For example:

type CustomMutator interface {
    encoding.BinaryMarshaler
    encoding.BinaryUnmarshaler
    Mutate(seed int64) error
}

icholy commented 1 year ago

@rhansen the corpus data is stored in a text format

rhansen commented 1 year ago

@rhansen the corpus data is stored in a text format

True, but if I understand correctly, that's an implementation detail. (At least I couldn't find an official specification of the testdata/fuzz/FuzzTestName/* corpus file format.) Either way, Go already encodes binary input values for writing to the corpus file. I would expect it to treat a CustomMutator value the same as a []byte value, aside from the marshal/unmarshal calls and a unique type identifier string.

gopherbot commented 1 year ago

Change https://go.dev/cl/493304 mentions this issue: testing: add support for fuzzing custom input types

gopherbot commented 1 year ago

Change https://go.dev/cl/493637 mentions this issue: design/48815-custom-fuzz-input-types.md: new design proposal

rhansen commented 1 year ago

I started a design doc. Feedback would be appreciated.

nightlyone commented 1 year ago

How should fuzzing of complex, nested data types work where you don't control the implementation of its methods but nevertheless want/need to use them as data types in your types?

rhansen commented 1 year ago

How should fuzzing of complex, nested data types work where you don't control the implementation of its methods but nevertheless want/need to use them as data types in your types?

@nightlyone As long as it is possible to marshal/unmarshal the type to/from binary (such as JSON or protobuf), you should be able to fuzz it. For example, if the type you want to fuzz is *foo.Foo, and that type implements the encoding/json.Marshaler and encoding/json.Unmarshaler interfaces, you could fuzz it like this:

package foo_test

import (
    "context"
    "testing"

    "me.example/mod/foo"
)

type fuzzInput struct {
    foo.Foo
}

func (v *fuzzInput) MarshalBinary() ([]byte, error) { return v.MarshalJSON() }
func (v *fuzzInput) UnmarshalBinary(d []byte) error { return v.UnmarshalJSON(d) }
func (v *fuzzInput) Mutate(ctx context.Context, seed int64) error {
    // mutate v.Foo as desired
    return nil
}

func FuzzFoo(f *testing.F) {
    // Assumption: the zero value of foo.Foo is valid. If that is not the
    // case, call f.Add or create a `testdata/fuzz/FuzzFoo/*` seed file
    // so that f.Fuzz has a valid value to start with.
    f.Fuzz(func(t *testing.T, v *fuzzInput) {
        if err := foo.FunctionYouWantToFuzz(&v.Foo); err != nil {
            t.Fatal(err)
        }
    })
}

If the third-party type is not directly marshalable, you could instead marshal a machine-readable sequence of instructions that recreate the desired state. (For example, marshal a protobuf message that lists function calls and their arguments. These calls would be replayed when unmarshaling to reconstruct the state from scratch.) The Mutate method would effectively mutate the sequence of instructions.

rhansen commented 1 year ago

Open issues in the design doc:

What is the best way to obtain a stable, globally unique, and marshalable identifier from a reflect.Type? The reflect.Type.String method does not guarantee global uniqueness. See https://go.dev/cl/493304 for an initial attempt.

Should MarshalBinary not return an error, forcing devs to call panic on error? We can always add support for a returned error in the future if desired.

Should Mutate not return an error, forcing devs to call panic on error? We can always add support for a returned error in the future if desired.

Should Mutate take a context.Context in case it wants to be cancelable? (Maybe it wants to send RPCs, or otherwise do something expensive.)

ianlancetaylor commented 1 year ago

Personally I think it's clearly useful for a MarshalBinary method to match encoding.BinaryMarshaler. If it seems really important to have a different signature, then the method should have a different name. Otherwise it will be annoying to use fuzzing with a type that has to implement encoding.BinaryMarshaler for its own purposes.

gopherbot commented 1 year ago

Change https://go.dev/cl/501537 mentions this issue: design/48815-custom-fuzz-input-types.md: revise from feedback

rhansen commented 1 year ago

After thinking on this for a while, I think the two best options for Mutate are:

Mutate(seed int64): Simple. Naturally hints to developers that the method is expected to be fast, repeatable, and error-free, which increases the effectiveness of fuzzing. Adding a context parameter or error return value (or both) might be YAGNI, but their absence in this option makes complex mutation operations more difficult to implement. The lack of an error return value encourages the use of panic, which is problematic for the reasons discussed in the design doc.
Mutate(ctx context.Context, seed int64) error: The error return value discourages the use of panic. The context makes this more future-proof by enabling advanced techniques once Mutate's repeatability requirement is removed. For example, Mutate could send an RPC to a service that feeds automatic crash report data to fuzzing tasks to increase the likelihood of encountering an interesting value. The context parameter and error return value might be YAGNI, but I believe the added implementation complexity and developer cognitive load is minor enough to not worry about it (they can be ignored in most use cases).

The design doc currently proposes Marshal(seed int64) error, which now feels like a half measure. I think we should either fully support complex use cases by passing a context, or we should go for simplicity and remove the error return value. I slightly prefer the option with the context parameter and error return value, so I sent out a merge request to update the design doc.

Anyone else have any opinions?

gopherbot commented 1 year ago

Change https://go.dev/cl/493302 mentions this issue: internal/fuzz: convert custom mutator panics to returned errors

sacheendra commented 1 year ago

Hi! Would it make sense to pass a defaultMutate func([]any) []any as another parameter to Mutate? The resulting signature would be Mutate(ctx context.Context, seed int64, defaultMutate func([]any) []any) error. defaultMutate would be the current mutation function for supported types: https://github.com/golang/go/blob/master/src/internal/fuzz/mutator.go#L48

My motivation is that I only want to custom mutate a few fields in a struct for example, and leave the rest to the default mutator. This is similar to LLVMFuzzerMutate used in LLVMFuzzerCustomMutator.

golang / go

proposal: testing: custom mutator support for fuzzing #48815