HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.57k stars 587 forks source link

RFC: Pull coverage guided testing from the feature set #1551

Closed DRMacIver closed 6 years ago

DRMacIver commented 6 years ago

I like that we have coverage guided testing, but it causes a bunch of problems (e.g. #1392, #1493, general performance issues), and realistically at the sort of workloads we actually have a good UX for right now, I don't think it can realistically ever pull its weight.

On the UX front, I think what we want is some sort of fuzzer support for Hypothesis, so that we can e.g. run a Hypothesis based test under python-afl and most of the benefits of the coverage guidance would be better served by adding that, and the current generation of coverage guided testing adds very little either currently or towards that goal.

So I'd like to propose that we do the following:

Thoughts?

Zac-HD commented 6 years ago

Well... python-afl is going to have exactly the same problem as discussed in #1493 (still using sys.settrace) and the only way it wouldn't also reproduce #1392 is if that's something about our @proxies decorator instead of coverage (which I think is likely).

So while I wouldn't mind an option to have test case generation driven by afl, I don't think removing or deprecating the current coverage-driven system would be helpful. I could be convinced have use_coverage=False as the default though.

Zalathar commented 6 years ago

True, but I think the idea is that people will be more willing to accept those limitations in a dedicated fuzzing mode, as long as they can run debuggers and coverage on their “normal” test runs without jumping through hoops.

DRMacIver commented 6 years ago

Well... python-afl is going to have exactly the same problem as discussed in #1493 (still using sys.settrace) and the only way it wouldn't also reproduce #1392 is if that's something about our @proxies decorator instead of coverage (which I think is likely).

Yes, but I think that's OK.

So while I wouldn't mind an option to have test case generation driven by afl, I don't think removing or deprecating the current coverage-driven system would be helpful.

I don't think that's the right way of looking at it. The question is not "would removing it would be helpful?" but "would retaining it be helpful?". It's a very costly feature in terms of sharp edges for users, performance, and code complexity, and it offers no user visible functionality except for allegedly being better at finding bugs and I'm not convinced it either does or can pull its weight there - the original idea was that it would improve over time as I did more research into the subject, but the reality is that a) I'm not going to be doing that research in the next year or two and b) it's really unclear whether this is a problem worth solving in the normal Hypothesis execution budget.

I could be convinced have use_coverage=False as the default though.

Setting use_coverage=False was going to be my initial proposal, but then I asked myself the following question: Under what circumstances would we ever recommend setting use_coverage to True? I couldn't come up with any. Can you?

Zac-HD commented 6 years ago

The question is not "would removing it would be helpful?" but "would retaining it be helpful?".

Setting use_coverage=False was going to be my initial proposal, but then I asked myself the following question: Under what circumstances would we ever recommend setting use_coverage to True? I couldn't come up with any. Can you?

When you want to run in fuzzing mode! My suggestion is basically to start by flipping the default, then add the fuzzing mode which makes use_coverage=True use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

So retaining it basically buys out of a compatible deprecation cycle, and clarifies the messaging around what the setting is for.

DRMacIver commented 6 years ago

When you want to run in fuzzing mode! My suggestion is basically to start by flipping the default, then add the fuzzing mode which makes use_coverage=True use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

No, the setting is basically useless for fuzzing mode, and we should be using someone else's fuzzer for that. It will work better and be less effort for us to maintain.

DRMacIver commented 6 years ago

Oh, sorry, I misread part of that:

use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

This also can't happen. The whole point of using someone else's fuzzer is that:

  1. The current workflow and UI for using Hypothesis as a fuzzer is very bad to the point of almost total uselessness.
  2. The current backend for using Hypothesis as a fuzzer is also very bad to the point of almost total uselessness.

The solution to (1) in particular is that we let do AFL what it basically demands to do and own main. In this case AFL plays an analogous role to something like pytest, it's not something under our control.

Zac-HD commented 6 years ago

Huh, I was thinking about basically using afl to drive the gitbits primitive in a long-running process via a shim that AFL could control.

In any case, my position is basically just leave the actual deprecation until we have a working version of the new fuzzing workflow.

DRMacIver commented 6 years ago

Huh, I was thinking about basically using afl to drive the gitbits primitive in a long-running process via a shim that AFL could control.

Yes, that's the idea, but if we're doing that then literally none of the settings are relevant because nothing is under our control.

In any case, my position is basically just leave the actual deprecation until we have a working version of the new fuzzing workflow.

I don't see why we should gate removing a feature that has never really lived up to its promise on another feature that does what it would have done if it worked, but if you really feel that strongly I don't object to simply turning use_coverage=False into the default and sitting back as literally nobody ever sets it to True because there is no good reason to do so.

DRMacIver commented 6 years ago

https://github.com/nedbat/coveragepy/issues/702 has reminded me of another non-trivial cost of keeping this feature: Our interaction with coverage is brittle and unofficial, and makes life worse for @nedbat as well as us, and fixing that issue is going to require a non-trivial amount of debugging / probably more bad hacks on the Coverage API.

Given this, unless someone comes up with at least one example of an actual concrete benefit that retaining this feature would bring, I am very strongly inclined to go ahead with the removal process.

@Zac-HD if you still don't like the deprecation plan I am happy to remove the feature and turn the flag into a dummy flag that silently does nothing until we backfill it with something else or actually deprecate it.

Zac-HD commented 6 years ago

Causing real problems for other projects is enough reason to tear it out IMO, so go ahead :rocket: