HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.5k stars 581 forks source link

Refactoring `ConjectureData` to support symbolic execution #3086

Closed Zac-HD closed 6 months ago

Zac-HD commented 3 years ago

Crosshair, by @pschanely, is a symbolic execution library, which can run @given() tests... inefficiently. When it works though, it's much better at finding bugs which only occur for specific or very rare inputs.

This issue is about how to refactor the ConjectureData class and conjecture.utils module - the latter containing several functions which should be ConjectureData methods. This is worth doing as a cleanup anyway, but provides the ideal integration point for Crosshair and in future can also help us detect redundant examples (i.e. #1986, see conjecture.datatree).

The core idea is to have strategies all get their data via type-specific methods, such as:

class DataProvider:
    # A thin wrapper around ConjectureData, to reduce the API surface

    def draw_integer(self):
        # cu.unbounded_integers()
    def write_integer(self, *, value):
        # extend self.buffer, as if we had passed force= to draw_bits

    def draw_integer_range(self, min_value, max_value):
        # cu.integer_range()
    def write_integer_range(self, min_value, max_value, *, value):
        # extend self.buffer, as if we had passed force= to draw_bits

    def draw_boolean(self, p=0.5):
        # cu.biased_coin()  (cu.boolean is trivial)

    # ... you get the idea.  Later add floats, bytestrings, and maybe even unicode.

class SymDataProvider(DataProvider):
    def draw_integer(self):
        n = a_symbolic_int()
        self.write_integer(n)  # should be OK to add symbolic values to the buffer
        return n               # because bitsize is fixed conditional on values of earlier draws

    def draw_integer_range(self, min_value, max_value):
        n = a_symbolic_int(min_value, max_value)
        self.write_integer(min_value, max_value, value=n)
        return n

    # etc.

And then to symbolically execute some test, pass a SymDataProvider() instance to a new hook similar to .fuzz_one_input (see e.g. https://github.com/HypothesisWorks/hypothesis/compare/master...Zac-HD:symex-refactoring). That's it - database support etc. should all be included for free via the write_* methods!

Zac-HD commented 2 years ago

Come to think of it, this would also work well for Atheris' FuzzDataProvider πŸ€”

pschanely commented 2 years ago

So! I am going to start poking at this myself, starting from Zac's branch. At least in a time-boxed, explorative kind of way.

@Zac-HD Any new developments that I should be aware of? Any chance you'd have some time to do incremental reviews along the way?

Zac-HD commented 2 years ago

I am absolutely up for code reviews; the only other news is that I'm hoping for some volunteers at the PyCon US sprints next week. Integers and bools are a good place to start; floats need some refactoring work which is probably easier beforehnad.

pschanely commented 2 years ago

Oh right, PyCon is happening! Happy to take a look a float refactoring too if you wanna give me some hints on what needs to be done.

At first glance, it looks like we won't be able to just replace ConjectureData with DataProvider in every SearchStrategy.do_draw (because of 3rd party strategies, at a minimum). Right? We'll introduce a new method perhaps? Your ideas about how to do this incrementally would be great.

Zac-HD commented 2 years ago

floats() refactoring: instead of having the floats() function dispatch to either FloatsStrategy or FixedBoundedFloatsStrategy, I'd like to do that dispatch (perhaps via memoized helper functions) inside FloatsStrategy.do_draw(). https://github.com/HypothesisWorks/hypothesis/pull/2878 is where I did this for integers(). That would in turn make it easy to do filter-rewriting, and provides the obvious place to insert a DataProvider shim πŸ˜‰

And we can indeed replace it everywhere, because SearchStrategy.do_draw() is private API, and everyone else should just be composing public strategies.

pschanely commented 2 years ago

floats() refactoring: instead of having the floats() function dispatch to either FloatsStrategy or FixedBoundedFloatsStrategy, I'd like to do that dispatch (perhaps via memoized helper functions) inside FloatsStrategy.do_draw(). #2878 is where I did this for integers(). That would in turn make it easy to do filter-rewriting, and provides the obvious place to insert a DataProvider shim πŸ˜‰

It needs some cleanup still, but early feedback welcome in this draft.

pschanely commented 2 years ago

It needs some cleanup still, but early feedback welcome in this draft.

Float refactoring was merged! Sadly, I've already exhausted my timebox for this round, but there's a decent chance I'll pop my head back up and continue to push on this later.

Zac-HD commented 10 months ago

@DRMacIver and I have been talking about an equivalent refactoring, motivated by the opportunities for reduced redundancy, smarter shrinking and mutation (e.g. for Inquisitor), symbolic/concolic execution support, and reading this recent paper on 'reflective generators' (which are infeasible in general, but could work at the low-level interface).

The exact shape of the interface is still undecided, but we're thinking of changing the underlying data structure from a bytearray to a tree, where the leaves are (int, float, bytes, str) instances and also track draw order so that we can mutate as if it was a flat sequence by causal order in addition to e.g. depth-first traversal. The low-level API might look something like:

class LowLevelProvider:
    # This is the low-level interface which would also be implemented
    # by e.g. CrossHair, by an Atheris-hypothesis integration, etc.
    # We'd then build the structured tree handling, database and replay
    # support, etc. on top of this - so all backends get those for free.

    def draw_integer(
        self,
        *,
        forced: int | None = None,
        min_value: int | None = None,
        max_value: int | None = None,
        weights: Sequence[Real] | None = None,  # for small bounded ranges
    ):
        if weights is not None:
            assert min_value is not None
            assert max_value is not None
            assert (max_value - min_value) <= 1024  # arbitrary practical limit
        if forced is not None:
            assert min_value is None or min_value <= forced
            assert max_value is None or forced <= max_value
        # This is easy to build on top of our existing conjecture utils,
        # and it's easy to build sampled_from and weighted_coin on this.
        ...

    def draw_float(
        self,
        *,
        forced: float | None = None,
        min_value: float | None = None,
        max_value: float | None = None,
        allow_nan: bool = True,
        allow_infinity: bool = True,
        allow_subnormal: bool = True,
        width: Literal[16, 32, 64] = 64,
        # exclude_min and exclude_max handled higher up
    ):
        ...

    def draw_string(
        self,
        *,
        forced: str | None = None,
        # Should we use `regex: str = ".*"` instead of alphabet and sizes?
        alphabet: ... = ...,
        min_size: int = 0,
        max_size: int | None = None,
    ):
        ...

    def draw_bytes(
        self,
        *,
        forced: bytes | None = None,
        min_size: int = 0,
        max_size: int | None = None,
    ):
        ...
Zac-HD commented 10 months ago

I wrote a quick spike: https://github.com/HypothesisWorks/hypothesis/compare/master...Zac-HD:hypothesis:lowlevel-provider - just porting over IntegersStrategy.do_draw() for now, but most cover tests already pass (exceptions: some stateful-testing stuff) so I think it's promising.

The bad news is that changing our fundamental data structures for the engine is going to be a huge project; the good news is that it's feasible to both do it and ship it in smaller pieces. For example:

  1. Give ConjectureData a PrimitiveProvider, and make all draws go via methods which delegate to the PrimitiveProvider. Make minimal changes to the internals at this point; we just want to get interface worked out and move the utils to be methods on PrimitiveProvider.
  2. Ensure that PrimitiveProvider can in fact be swapped out for an alternative backend - Provider should make no assumptions about the implementation. At this point we should be able to find a bug using a CrosshairPrimitiveProvider, although replay using Hypothesis itself (which includes shrinking and database support) will come in the last step below.
  3. Port TreeObserver, generate_novel_prefix() et al to use the new IR, reducing redundancy (e.g. https://github.com/HypothesisWorks/hypothesis/issues/1574)... if it is in fact easier to do this rather than a big-bang migration of everything at once.
  4. Change our database format and shrink logic to use a sequence of primitive types instead of a bytestring (e.g. https://github.com/HypothesisWorks/hypothesis/issues/1986). The final stage buys us complete interoperability with other backends, and probably also some large performance benefits.
Zac-HD commented 9 months ago

Earlier this evening, @pschanely and I found a synthetic bug using a new CrosshairPrimitiveProvider.

This was rather exciting, given that @tybug's PR for step 1 isn't quite finished yet, and Phillip had implemented the provider earlier in the day - but our totally untested hackjob of a demonstration spike did in fact find a bug with pytest test_demo.py using Crosshair as a backend to Hypothesis' own internals.

Next steps: once #3788 is merged, both (2) and also (3) can proceed concurrently. I (or another volunteer!) will define Hypothesis-side hooks for integration and a way to configure which backed to use, so that Crosshair can register as a Hypothesis plugin and automatically set that up. Phillip will clean up the Crosshair side of our demo, and we'll collaborate to ensure that Crosshair can be used on Hypothesis tests even before we finish up this issue and are ready for Hypothesis to use Crosshair as a backend.

It's been a fantastic week. Onwards!

Zac-HD commented 9 months ago

We've merged and released part (1) of the plan above! Next up, I'll work out how to add some nice-ish hooks for Crosshair and similar tools to plug themselves in as backends - and emit warnings until we get the other parts done. @tybug, do you want to continue on to (3)?

tybug commented 9 months ago

Yup, I'll tackle that next. I expect it will take me some time to wrap my head around the existing tree logic (but this is something I need a solid understanding of regardless, so that's ok!).

My current high-level understanding is that generate_novel_prefix actually deals with a tree right now as well, with the exception that the nodes are individual calls to draw_bits. I'm expecting to change this representation to have 5 intermediate node types instead. Generating a novel prefix would involve traversing the tree and forcing as normal, and drawing a new value from that particular node type when reaching a node that is not forced and not exhausted.

I'm not sure if there will be finicky things involving e.g. computing is_exhausted with this new IR yet - a task for future me πŸ™‚

Zac-HD commented 9 months ago

That sounds right to me!

Note that we'll need to implement forcing for most of the primitives; iirc we currently only have that for bounded ints and bools. Maybe make an independent PR just with that; we'd then be able to assimilate stuff found by crosshair et al even before moving over the shrinker and database format.

Zac-HD commented 9 months ago

@pschanely: https://github.com/HypothesisWorks/hypothesis/compare/master...Zac-HD:hypothesis:provider-plugins has a draft of the pluggable hooks. I'm still considering whether register_new_backend() should live directly in the hypothesis namespace (probably yes, and just make it emit a warning for now?) and the docs are... rough... but this should be enough that you can test the crosshair changes.

pschanely commented 9 months ago

Super exciting. I've been working on a variety of compatibility issues since last week; I'm making good progress, but there are still things to investigate. I see now how you're getting around the bits that want to replay an execution - just skip them! (locally, I noticed another replay spot here that CrossHair dies on)

In the meantime, I have another issue ... that I probably should have foreseen. It isn't always safe to operate on symbolics outside the context manager. Integers like the one we tried will mostly work, but other things (strings especially) will blow up unless I have the appropriate interpreter intercepts installed. There are a few different ways that symbolics may outlive the test run:

  1. The primitives that hypothesis will record in this tree structure that you're designing right now.
  2. In an exception message or attribute.
  3. In the return value of the test. (yes, tests should return None, but hypothesis may explode when trying to complain about the return value if it is symbolic)
  4. (other places you can think of?)

Annnd, some potential solutions for each of the above:

  1. It looks like the provider can already take the conjecture data. So I could peek in there when my context manager is about to exit and walk/replace the primitive tree with realized values.
  2. I think hypothesis error handling happens outside my context manager, so I can probably realize the exception on its way out.
  3. Ignore this for now? Alternatively, I could solve (2) & (3) by somehow wrapping the test function.

Thoughts?

Zac-HD commented 9 months ago

aha - that's exactly the replay spot I also identified, though I still need to wrap it in the context manager. We'll probably just have some more back-and-forth like this leading up to a coordinated release πŸ™‚

Sounds like realizing on the way out of the context manager will solve (1) and (2); IMO simply not solving (3) is reasonable since it's a user error already, and if anything else comes up for (4) I'll post again here.

pschanely commented 9 months ago

Sounds good on not thinking about (3) yet. I took a little time to try and write something that would connect up with your provider-plugins branch, here. Ready to work on these in parallel!

In addition to the context managers, I'll need a way to share CrossHair's RootNode state in between runs for the same test, so maybe think about how you'd like that to happen. (maybe jam something onto the runner if I can access that or something?)

Zac-HD commented 9 months ago

In addition to the context managers, I'll need a way to share CrossHair's RootNode state in between runs for the same test, so maybe think about how you'd like that to happen. (maybe jam something onto the runner if I can access that or something?)

I've added this to the branch and opened a WIP pr - by having a global monkeypatchable function which returns a context manager (to wrap all runs for a test) which yields a function that returns a second context manager (to wrap each test case). Please forgive the other changes there, I'm in 'move fast and break things' mode while we iterate towards a workable design.

pschanely commented 8 months ago

Some small updates on my end:

  1. Updated my intergration code to line up with this branch.
  2. Updated the string generation to no longer pre-maturely realize the length of the string (using some hooks I put in CrossHair v0.0.46).
Zac-HD commented 8 months ago

TODO: detect whether this specific test is supposed to use the crosshair backend (somehow) and return nullcontext if it isn't. (link)

This seems like it calls for a real API design, since there might be multiple backends fighting over the hacky monkeypatchable thingπŸ˜…

For now, checking whether hypothesis.settings.default.backend == "crosshair" should do the trick.

Zac-HD commented 6 months ago

Closing this in favor of #3914, since we've shipped an initial implementation ✨