Better support for property based testing

dbp commented 2 years ago

Currently, it's very natural to express properties using satisfies, and with a fix for #1633, debugging errors that are uncovered by generated input is quite straightforward. The last typical piece of property based testing is a way to shrink bad test inputs (probably that either cause errors or fail to pass the predicate).

It's possible to do this currently, but quite a bit of work -- you essentially have to run the tests multiple times outside of the testing framework, and then only actually emit a satisfies when you know you've gotten to the smallest case (and, you have to define this helper code within the test block, as the satisfies can't appear elsewhere) -- which really isn't great.

I wonder if it's worth adding built in support for this pattern, i.e., something like:

generate-input() satisfies my-property minimize my-shrink

Or something of that sort. The semantics would be:

If generate-input() errors, we just report the error
If my-property either raises an error or returns false on the left side value, we call my-shrink on that input. my-shrink should return an option: if it's none, we can't make it smaller, so we report the failing test. If it is some(v) we re-run my-property on it. If it still fails, we keep going, whereas if it passes, we instead report the previous input as our smallest erroneous case.

Some open questions:

does the new error have to be identical? i.e., if when the input got smaller, the reason for the test failing changed, do we still want to take the smaller input? For simplicity, probably the answer is yes -- the goal is to return the smallest input that did not pass the test.
Do we want to also show the original failing input? i.e., show that the test failed on some input, and then that we shrunk it to some other input that still failed. If we do this, whether the error is the same matters, as if we don't ensure that, we should show both errors.

jpolitz commented 2 years ago

Quick ideas: I think a generalization of this is that there are many cases where the calculation's result, along with other information about the expression on the LHS, ought to be processed before being reported. I could see having some kind of a report or format or process option for all testing forms that takes in information about the test and post-processes it.

It could also be that satisfies takes a %(post-process) that allows for this kind of refinement.

dbp commented 2 years ago

I like the idea of satisfies%(post-process) (I initially was wondering if there was a way to get this to work using just is%(something), but the visibility of the smaller input is the problem), as it does seem like there should be a generalization that works.

I think, at least for this use case, what post-process would need would be: the original input, whether the test passed, and, when it returns, indicating if the test should be re-run on a different input.

i.e., something like:

{ input : A, result : Exn | Fail | Pass} -> ReportResult | Rerun A

Perhaps ReportResult above could have additional info (if the idea was that you were giving further clarification as to why a particular input would have failed...).

brownplt / pyret-lang

Better support for property based testing #1635