BurntSushi / quickcheck

Automated property based testing for Rust (with shrinking).
The Unlicense
2.4k stars 149 forks source link

Running in a loop #193

Closed ethanpailes closed 6 years ago

ethanpailes commented 6 years ago

It is often useful to run quickcheck properties in a loop until one blows up. I did this while looking into the test failures for the TBM PR over in the regex crate, and I uncovered a few unrealted bugs. I think there are a few directions that this could go.

Just add some docs

The easiest and least invasive thing would be to just add a note in the README letting people know that some bugs don't show up in 100 test runs, so it is worth writing a script like this (obviously with a bit more generality).

#!/bin/sh

while true
do
    cargo test --lib qc_
    if [[ x$? != x0 ]] ; then
        exit $?
    fi
done

Add an environment variable

I'm not a huge fan of configuration through environment variables, but doing it through CLI commands would require cargo changes, and its not clear that cargo should be tightly coupled with quickcheck. Another issue with just using an environment variable is that you can only loop one test at a time.

The idea here would be to have a flag passed through the environment asking quickcheck to loop until failure. The main advantage here would be performance, so you can spend more cycles hammering your property and less standing up processes.

Given that it doesn't seem like you can run multiple properties at once with pure quickcheck changes (at least without some really gross thread shennanigans), I'm not so hot on this idea, but I wanted to mention it in case you had some special insight.

I'd be happy to make these changes and open a PR (though if door number 2 turns out to be feasable after all I may need a few pointers).

bluss commented 6 years ago

I can recommend fuzzing with cargo fuzz (using libfuzzer). It's quite fast, and finds bugs in a more open ended way than quickcheck properties and Arbitrary inputs. Here's one bug finding story, which notes I had to feed the fuzzer a dictionary of suspect inputs https://github.com/bluss/galil-seiferas/pull/1

Note that it does not use the dictionary verbatim, but copies and tweaks entries in the dictionary to make new inputs.

BurntSushi commented 6 years ago

Typically if I hit this problem where I need to execute quickcheck in a loop, then I do one or more of the following:

  1. Increase the number of tests by one or two orders of magnitude. See: https://docs.rs/quickcheck/0.5.0/quickcheck/struct.QuickCheck.html#method.tests --- This is basically the same as running quickcheck in a loop I think?
  2. Spend more time in the generator such that it makes more interesting inputs with a higher probability.

I'd be happy to have a short blurb about this in the README, including @bluss's reference to cargo fuzz.

ethanpailes commented 6 years ago

Mentioning cargo fuzz seems like a must in any discussion of random testing in a loop. thanks @bluss!

I have more experience with Haskell quickcheck than this crate, so I could be wrong about this, but wouldn't 1 have a potential starvation issue. If you have lots of properties cranking the number of tests high enough that it was effectively an infinite loop until something went wrong would mean that only the first property was run. Alternatively, if you don't crank it quite that high it is harder to run it overnight or continuously on a dedicated property testing server, because the computation will terminate.

I don't understand 2, but it sounds really cool! I'm afraid I'm more of a user of property testing than someone with real understanding. What is the knob to turn to get smarter input?

BurntSushi commented 6 years ago

I don't understand 2, but it sounds really cool! I'm afraid I'm more of a user of property testing than someone with real understanding. What is the knob to turn to get smarter input?

There is no knob. It's just writing a better implementation of Arbitrary. The implementation of Arbitrary does two things: generates random inputs and shrinks inputs. If you generate "interesting" random inputs with higher probability, then you should need to run fewer tests.

This is of course hand wavy. To a certain extent, the idea of biasing towards corner cases is somewhat antithetical to property based testing (it presupposes that you, the programmer, know the corner cases), but in practice, it can work well.

but wouldn't 1 have a potential starvation issue. If you have lots of properties cranking the number of tests high enough that it was effectively an infinite loop until something went wrong would mean that only the first property was run. Alternatively, if you don't crank it quite that high it is harder to run it overnight or continuously on a dedicated property testing server, because the computation will terminate.

I don't think I understand this. The point of running a quickcheck property in the loop is, I presume, to explore the space of inputs more thoroughly. The same effect can be achieved by increasing the number of tests. There are trade offs between these approaches, and you may even want to combine them. For example, running in a loop is done on-demand, so you only pay for the cost of a large number of tests specifically when you want to pay for it. But increasing the number of tests to N+1 from N will always explore more of the input space every time you run the tests, at the cost of running the additional test.

ethanpailes commented 6 years ago

I don't think I understand this.

I don't think I explained my thought processes thoroughly enough. Sorry about that. Let's see if a concrete example will make what I mean clearer. I recently ran the script I posted in the original issue for about a day and a half (no kill like overkill right?) in order to make sure that there were no more quickcheck issues in the regex PR that just got merged. I just let it run and then walked away, knowing that it would keep going until I came back and told it to give up or until it found another bug. I'm not confident in my ability to translate "run for N iterations" into "run for M hours", so the unbounded nature of the script is nice.

As I understand it when you run cargo test for a project with 3 quickcheck properties (call them qc1, qc2, and qc3), it will run qc1 for the number of tests you have asked for, then move on to qc2 and qc3. If you've asked for lots of tests it will take a long time to move on from each test.

The starvation issue comes up if you crank the number of tests up so high that runtime is effectively unbounded (so you can walk away and come back whenever). It seems like you would never get out of qc1. Alternatively, you could not crank the number of tests up that high, but then it won't run until a bug pops up or you tell it to stop anymore. Basically, I agree that increasing the number of tests and running in a loop both explore the space more thoroughly, but I want to make sure that quickcheck properties have their space explored in a round robin way rather than a depth first way.

Am I making sense or is my mental model of how quickcheck and cargo test interact off base?

BurntSushi commented 6 years ago

@ethanpailes Yeah that makes sense. It kinda feels like we're saying roughly the same thing.

What is the action item here? Running quickcheck in a loop is certainly not an ideal situation. It's more of a hack than anything else. But perhaps it is pragmatic. I'm undecided if it belongs in the README or not. If someone added a quick note somewhere, I don't think I'd be opposed to it.

ethanpailes commented 6 years ago

@BurntSushi, I definitly agree with you that the level of hackyness makes me uncomfortable. I think my platonic ideal feature (a flag to cargo) for this use case would involve cargo test (or maybe libtest) knowing things about the internal structure of quickcheck. I'll open a PR with a short note about running quickcheck in a loop. I think the main takeaway from this thread is that there are lots of ways to do it, so I'll try to convey the viewpoint that it is mostly a matter of taste.