Open s3nt3 opened 2 years ago
cc @golang/fuzzing
A friendly ping: any update?
This is unlikely to be implemented for 1.18, but might be considered for 1.19.
Thanks for your reply, I will maintain an implementation in my local branch(https://github.com/s3nt3/go/tree/dev.fuzz.custom_mutator). Looking forward to supporting this feature in the new development cycle.
The interface should probably satisfy encoding.{TextMarshaler,TextUnmarshaler}
.
type CustomMutator interface {
Mutate() error
MarshalText() ([]byte, error)
UnmarshalText([]byte) error
}
I believe Go 1.20 development cycle is currently open, and implementation in https://github.com/s3nt3/go/tree/dev.fuzz.custom_mutator looks like a very compact change. Any chance of this being included in 1.20? I could definitely use this capability in some of my projects :)
Additional background: Structure-Aware Fuzzing with libFuzzer
The interface should probably satisfy
encoding.{TextMarshaler,TextUnmarshaler}
.
Users might be fuzzing binary data such as images or protobufs. Wouldn't binary marshaler/unmarshaler be a better choice?
type CustomMutator interface {
encoding.BinaryMarshaler
encoding.BinaryUnmarshaler
Mutate() error
}
Also, would it be useful to supply an int64
seed, rand.Source
, or rand.Rand
to help with reproducibility? For example:
type CustomMutator interface {
encoding.BinaryMarshaler
encoding.BinaryUnmarshaler
Mutate(seed int64) error
}
@rhansen the corpus data is stored in a text format
@rhansen the corpus data is stored in a text format
True, but if I understand correctly, that's an implementation detail. (At least I couldn't find an official specification of the testdata/fuzz/FuzzTestName/*
corpus file format.) Either way, Go already encodes binary input values for writing to the corpus file. I would expect it to treat a CustomMutator
value the same as a []byte
value, aside from the marshal/unmarshal calls and a unique type identifier string.
Change https://go.dev/cl/493304 mentions this issue: testing: add support for fuzzing custom input types
Change https://go.dev/cl/493637 mentions this issue: design/48815-custom-fuzz-input-types.md: new design proposal
I started a design doc. Feedback would be appreciated.
How should fuzzing of complex, nested data types work where you don't control the implementation of its methods but nevertheless want/need to use them as data types in your types?
How should fuzzing of complex, nested data types work where you don't control the implementation of its methods but nevertheless want/need to use them as data types in your types?
@nightlyone As long as it is possible to marshal/unmarshal the type to/from binary (such as JSON or protobuf), you should be able to fuzz it. For example, if the type you want to fuzz is *foo.Foo
, and that type implements the encoding/json.Marshaler
and encoding/json.Unmarshaler
interfaces, you could fuzz it like this:
package foo_test
import (
"context"
"testing"
"me.example/mod/foo"
)
type fuzzInput struct {
foo.Foo
}
func (v *fuzzInput) MarshalBinary() ([]byte, error) { return v.MarshalJSON() }
func (v *fuzzInput) UnmarshalBinary(d []byte) error { return v.UnmarshalJSON(d) }
func (v *fuzzInput) Mutate(ctx context.Context, seed int64) error {
// mutate v.Foo as desired
return nil
}
func FuzzFoo(f *testing.F) {
// Assumption: the zero value of foo.Foo is valid. If that is not the
// case, call f.Add or create a `testdata/fuzz/FuzzFoo/*` seed file
// so that f.Fuzz has a valid value to start with.
f.Fuzz(func(t *testing.T, v *fuzzInput) {
if err := foo.FunctionYouWantToFuzz(&v.Foo); err != nil {
t.Fatal(err)
}
})
}
If the third-party type is not directly marshalable, you could instead marshal a machine-readable sequence of instructions that recreate the desired state. (For example, marshal a protobuf message that lists function calls and their arguments. These calls would be replayed when unmarshaling to reconstruct the state from scratch.) The Mutate
method would effectively mutate the sequence of instructions.
Open issues in the design doc:
- What is the best way to obtain a stable, globally unique, and marshalable identifier from a
reflect.Type
? Thereflect.Type.String
method does not guarantee global uniqueness. See https://go.dev/cl/493304 for an initial attempt.
- Should
MarshalBinary
not return an error, forcing devs to callpanic
on error? We can always add support for a returned error in the future if desired.- Should
Mutate
not return an error, forcing devs to callpanic
on error? We can always add support for a returned error in the future if desired.- Should
Mutate
take acontext.Context
in case it wants to be cancelable? (Maybe it wants to send RPCs, or otherwise do something expensive.)
Personally I think it's clearly useful for a MarshalBinary
method to match encoding.BinaryMarshaler
. If it seems really important to have a different signature, then the method should have a different name. Otherwise it will be annoying to use fuzzing with a type that has to implement encoding.BinaryMarshaler
for its own purposes.
Change https://go.dev/cl/501537 mentions this issue: design/48815-custom-fuzz-input-types.md: revise from feedback
After thinking on this for a while, I think the two best options for Mutate
are:
Mutate(seed int64)
: Simple. Naturally hints to developers that the method is expected to be fast, repeatable, and error-free, which increases the effectiveness of fuzzing. Adding a context parameter or error return value (or both) might be YAGNI, but their absence in this option makes complex mutation operations more difficult to implement. The lack of an error return value encourages the use of panic
, which is problematic for the reasons discussed in the design doc.Mutate(ctx context.Context, seed int64) error
: The error return value discourages the use of panic
. The context makes this more future-proof by enabling advanced techniques once Mutate
's repeatability requirement is removed. For example, Mutate
could send an RPC to a service that feeds automatic crash report data to fuzzing tasks to increase the likelihood of encountering an interesting value. The context parameter and error return value might be YAGNI, but I believe the added implementation complexity and developer cognitive load is minor enough to not worry about it (they can be ignored in most use cases).The design doc currently proposes Marshal(seed int64) error
, which now feels like a half measure. I think we should either fully support complex use cases by passing a context, or we should go for simplicity and remove the error return value. I slightly prefer the option with the context parameter and error return value, so I sent out a merge request to update the design doc.
Anyone else have any opinions?
Change https://go.dev/cl/493302 mentions this issue: internal/fuzz: convert custom mutator panics to returned errors
Hi! Would it make sense to pass a defaultMutate func([]any) []any
as another parameter to Mutate
?
The resulting signature would be Mutate(ctx context.Context, seed int64, defaultMutate func([]any) []any) error
.
defaultMutate
would be the current mutation function for supported types: https://github.com/golang/go/blob/master/src/internal/fuzz/mutator.go#L48
My motivation is that I only want to custom mutate a few fields in a struct for example, and leave the rest to the default mutator.
This is similar to LLVMFuzzerMutate used in LLVMFuzzerCustomMutator
.
As the official fuzzer implementation provided by golang, the native fuzzer should be well suited for various usage scenarios. However, currently native fuzzers only support general mutation algorithms for built-in types. Therefore, in many cases, the native fuzzer cannot efficiently generate test inputs. For example, when testing a DSL parser, the mutator will generate a large amount of output that cannot pass the syntax or semantic check. A possible solution is to provide support for custom mutators, so that users can implement custom mutators for various fuzz targets and reuse other parts of the native fuzzer.
I tried on the existing code and designed the following interface:
The object that implements the above interface can be passed as an argument to the
testing.F.Fuzz
method, and it will call theMutate
method to use the custom mutation algorithm. TheMarshal
andUnmarshal
methods ensure that it can be imported/exported to a corpus file.I think supporting custom mutator will bring the following benefits:
In addition, custom mutator will also bring some side effects, such as the custom mutator code will be instrumented, which may affect performance and the accuracy of coverage statistics.
/cc @jayconrod