Add `Test.make_neg` - Githubissues

jmid commented 2 years ago

This PR adds a "primitive" Test.make_neg which lets one express negative property-based tests, that is, tests that are supposed to fail. This can be useful

to safe guard the error-finding ability of a generator
to supplement existing positive test (supposed to succeed)
to document in test source code which tests are supposed to fail and succeed

The commercial Erlang QuickCheck has a similar operation, fails: http://quviq.com/documentation/eqc/eqc.html#fails-1

Here's an example:

open QCheck
let t1 = Test.make_neg ~name:"all ints are even" small_int (fun i -> i mod 2 = 0)
let t2 = Test.make ~name:"int double" small_int (fun i -> i + i = i * 2)
let t2' = Test.make_neg ~name:"int double" small_int (fun i -> i + i = i * 2)
let t3 = Test.make_neg ~name:"list rev concat" (pair (list small_int) (list small_int)) (fun (is,js) -> (List.rev is)@(List.rev js) = List.rev (is@js))
let t4 = Test.make ~name:"pos fail" small_int (fun i -> raise (Failure "pos-fail"))
let t5 = Test.make_neg ~name:"neg fail" small_int (fun i -> raise (Failure "neg-fail"))
;;
QCheck_runner.run_tests ~verbose:true [t1;t2;t2';t3;t4;t5];;

and corresponding output:

random seed: 442234180
generated error fail pass / total     time test name
[✓]    1    0    0    1 /  100     0.0s all ints are even
[✓]  100    0    0  100 /  100     0.0s int double
[✗]  100    0  100    0 /  100     0.0s int double
[✓]    1    0    0    1 /  100     0.0s list rev concat
[✗]    1    1    0    0 /  100     0.0s pos fail
[✗]    1    1    0    0 /  100     0.0s neg fail

--- Info -----------------------------------------------------------------------

Negative test all ints are even failed as expected (2 shrink steps):

3

--- Failure --------------------------------------------------------------------

Test int double failed:

Negative test int double succeeded but was expected to fail

--- Info -----------------------------------------------------------------------

Negative test list rev concat failed as expected (17 shrink steps):

([0], [1])

=== Error ======================================================================

Test pos fail errored on (5 shrink steps):

0

exception Failure("pos-fail")

=== Error ======================================================================

Test neg fail errored on (5 shrink steps):

0

exception Failure("neg-fail")

================================================================================
failure (1 tests failed, 2 tests errored, ran 6 tests)
- : int = 1

I have gone for keeping changes to a minimum and thus leave the test-generation-shrink loop and Test.check_cell untouched. Negative tests are marked with a bool in their underlying cell - and it is instead the job of the runner to match this marker with TestResult.ts.

In the current design a test Error is considered a failure for both positive and negative tests (see pos-fail and neg-fail above).

Things missing:

~~internal tests of the new functionality~~
~~extend the alcotest runner~~
~~extend the ounit runner~~
~~add CHANGELOG entry~~

Input on the design and/or implementation are welcome.

jmid commented 2 years ago

After thinking a bit more, I reverted the step counter changes (and added tests).

For the above, I attach the output from ~verbose:true below:

negative tests print a Info block (only in verbose-mode) to show a minimal, shrunk input causing an expected failure (tests 1 and 4)
the numbers in columns error, fail, pass are the same for negative tests as for positive tests. The idea is here, that it is easiest to understand for a user, e.g.,
- in test 1, only 1 input was generated which failed the tested property, and hence caused the negative PBT to pass :heavy_check_mark:
- in test 3, all 100 generated inputs passed the tested property, and hence caused the negative PBT to fail ❌
exception-throwing is considered an error for Test.make_neg in test 6 (like for positive tests in test 5) ❌ to avoid unexpected errors to go undetected in negative tests (expected exceptions can be pattern-matched in a try-with)

random seed: 376624633
generated error fail pass / total     time test name
[✓]    1    0    1    0 /  100     0.0s all ints are even
[✓]  100    0    0  100 /  100     0.0s int double
[✗]  100    0    0  100 /  100     0.0s int double
[✓]    1    0    1    0 /  100     0.0s list rev concat
[✗]    1    1    0    0 /  100     0.0s pos fail
[✗]    1    1    0    0 /  100     0.0s neg fail

--- Info -----------------------------------------------------------------------

Negative test all ints are even failed as expected (0 shrink steps):

7

--- Failure --------------------------------------------------------------------

Test int double failed:

Negative test int double succeeded but was expected to fail

--- Info -----------------------------------------------------------------------

Negative test list rev concat failed as expected (18 shrink steps):

([0], [1])

=== Error ======================================================================

Test pos fail errored on (4 shrink steps):

0

exception Failure("pos-fail")

=== Error ======================================================================

Test neg fail errored on (4 shrink steps):

0

exception Failure("neg-fail")

================================================================================
failure (1 tests failed, 2 tests errored, ran 6 tests)
- : int = 1

jmid commented 2 years ago

Based on QCheck2.Test.print_expected_failure added in 8b1f246, commit 28d8891 updates QCheck_ounit to support Test.make_neg.

jmid commented 2 years ago

I've completed initial Test.make_neg support in QCheck_alcotest in f932c28 and updated the CHANGELOG, so this first version should be complete.

(I also discovered that I wrongly call it Test.neg_test in the title and in commit messages rather than Test.make_neg... :man_facepalming: :grimacing:)

jmid commented 2 years ago

I've now

added documentation of Test.make_neg (and polished the existing docs a bit)
corrected the commit messages to use the right name binding Test.make_neg and force pushed the correction

Green light to merge @c-cube @vch9?

vch9 commented 2 years ago

I will have a look tomorrow morning, this seems to be a good idea!

vch9 commented 2 years ago

Looks goo to merge modulo my very minor remarks!

vch9 commented 2 years ago

I can not close the threads (I don't know if it's a permission issue or a github thing though), but, I think they were all treated.

jmid commented 2 years ago

Thanks @vch9 - I'll squash a bit and then merge.

alopezz commented 1 year ago

Was it intended for Test.check_exn to ignore the "polarity" of the test as per this feature?

I was just trying out the library and after seeing this:

Test.make to build a test,

Test.make_neg to build a negative test that is expected not to satisfy the tested property,

Test.check_exn to run a single test with a simple runner.

I started to try with some example of my own and I saw that Test.check_exn was doing the same for make and make_neg when using the same property.

E.g. I would have expected the following to raise an exception but it doesn't, which is confusing.

let test = Test.make_neg ~name:"int double" small_int (fun i -> i + i = i * 2) in Test.check_exn test

From a quick at the look it sounds like it might be possible to factor out the individual implementations for ounit and alcotest by moving those somewhere under check_cell_exn since they both rely on it, which might provide a more consistent behavior. If that sounds good I wouldn't mind trying to work on the change myself.

jmid commented 1 year ago

First off, I agree that this is confusing - and the lack of documentation is certainly not helping here. In QCheck.Test the relevant functions are implemented by calling the equivalent QCheck2.Test ones, which are slightly better documented. For example, https://c-cube.github.io/qcheck/0.20/qcheck-core/QCheck2/Test/index.html#val-check_cell

check_cell ~long ~rand test generates up to count random values of type 'a using Gen.t and the random state st. The predicate law is called on them and if it returns false or raises an exception then we have a counter-example for the law.

When I implemented the first make_neg version, it just seemed natural to build on Test.check_exn as a primitive. It furthermore had the advantage of not having to change the existing public signatures. I'm unsure what would be a reasonable interface for a unit-returning check_exn function in the case of success or failure of negative tests to communicate both "negative test did not fail as expected" and "negative test failed as expected and this is the counterexample". Another aspect is making such a change in a greater context. As I see it, there are really 3 low-level drivers check_cell, check_cell_exn and check_exn. Here it is surprising, but at least consistent that all three ignore the test polarity... :sweat_smile:

The simple answer is probably to document the current behaviour better.

alopezz commented 1 year ago

@jmid Thanks for your reply!

I made a small PR (#271) to better illustrate what I meant. I'm not sure that check_exn would necessarily need to provide a counterexample in the "negative test failed as expected" case. Let me know what you think!

c-cube / qcheck

Add `Test.make_neg` #244