Closed jmid closed 2 years ago
After thinking a bit more, I reverted the step
counter changes (and added tests).
For the above, I attach the output from ~verbose:true
below:
Info
block (only in verbose-mode) to show a minimal, shrunk input causing an expected failure (tests 1 and 4)error
, fail
, pass
are the same for negative tests as for positive tests. The idea is here, that it is easiest to understand for a user, e.g.,
Test.make_neg
in test 6 (like for positive tests in test 5) ❌ to avoid unexpected errors to go undetected in negative tests (expected exceptions can be pattern-matched in a try
-with
)random seed: 376624633
generated error fail pass / total time test name
[✓] 1 0 1 0 / 100 0.0s all ints are even
[✓] 100 0 0 100 / 100 0.0s int double
[✗] 100 0 0 100 / 100 0.0s int double
[✓] 1 0 1 0 / 100 0.0s list rev concat
[✗] 1 1 0 0 / 100 0.0s pos fail
[✗] 1 1 0 0 / 100 0.0s neg fail
--- Info -----------------------------------------------------------------------
Negative test all ints are even failed as expected (0 shrink steps):
7
--- Failure --------------------------------------------------------------------
Test int double failed:
Negative test int double succeeded but was expected to fail
--- Info -----------------------------------------------------------------------
Negative test list rev concat failed as expected (18 shrink steps):
([0], [1])
=== Error ======================================================================
Test pos fail errored on (4 shrink steps):
0
exception Failure("pos-fail")
=== Error ======================================================================
Test neg fail errored on (4 shrink steps):
0
exception Failure("neg-fail")
================================================================================
failure (1 tests failed, 2 tests errored, ran 6 tests)
- : int = 1
Based on QCheck2.Test.print_expected_failure
added in 8b1f246, commit 28d8891 updates QCheck_ounit to support Test.make_neg
.
I've completed initial Test.make_neg
support in QCheck_alcotest
in f932c28 and updated the CHANGELOG,
so this first version should be complete.
(I also discovered that I wrongly call it Test.neg_test
in the title and in commit messages rather than Test.make_neg
... :man_facepalming: :grimacing:)
I've now
Test.make_neg
(and polished the existing docs a bit)Test.make_neg
and force pushed the correctionGreen light to merge @c-cube @vch9?
I will have a look tomorrow morning, this seems to be a good idea!
Looks goo to merge modulo my very minor remarks!
I can not close the threads (I don't know if it's a permission issue or a github thing though), but, I think they were all treated.
Thanks @vch9 - I'll squash a bit and then merge.
Was it intended for Test.check_exn
to ignore the "polarity" of the test as per this feature?
I was just trying out the library and after seeing this:
- Test.make to build a test,
- Test.make_neg to build a negative test that is expected not to satisfy the tested property,
- Test.check_exn to run a single test with a simple runner.
I started to try with some example of my own and I saw that Test.check_exn
was doing the same for make
and make_neg
when using the same property.
E.g. I would have expected the following to raise an exception but it doesn't, which is confusing.
let test = Test.make_neg ~name:"int double" small_int (fun i -> i + i = i * 2) in Test.check_exn test
From a quick at the look it sounds like it might be possible to factor out the individual implementations for ounit and alcotest by moving those somewhere under check_cell_exn
since they both rely on it, which might provide a more consistent behavior. If that sounds good I wouldn't mind trying to work on the change myself.
First off, I agree that this is confusing - and the lack of documentation is certainly not helping here.
In QCheck.Test
the relevant functions are implemented by calling the equivalent QCheck2.Test
ones, which are slightly better documented. For example, https://c-cube.github.io/qcheck/0.20/qcheck-core/QCheck2/Test/index.html#val-check_cell
check_cell ~long ~rand test generates up to count random values of type 'a using Gen.t and the random state st. The predicate law is called on them and if it returns false or raises an exception then we have a counter-example for the law.
When I implemented the first make_neg
version, it just seemed natural to build on Test.check_exn
as a primitive.
It furthermore had the advantage of not having to change the existing public signatures.
I'm unsure what would be a reasonable interface for a unit
-returning check_exn
function in the case of success or failure of negative tests to communicate both "negative test did not fail as expected" and "negative test failed as expected and this is the counterexample".
Another aspect is making such a change in a greater context. As I see it, there are really 3 low-level drivers check_cell
, check_cell_exn
and check_exn
. Here it is surprising, but at least consistent that all three ignore the test polarity... :sweat_smile:
The simple answer is probably to document the current behaviour better.
@jmid Thanks for your reply!
I made a small PR (#271) to better illustrate what I meant. I'm not sure that check_exn
would necessarily need to provide a counterexample in the "negative test failed as expected" case. Let me know what you think!
This PR adds a "primitive"
Test.make_neg
which lets one express negative property-based tests, that is, tests that are supposed to fail. This can be usefulThe commercial Erlang QuickCheck has a similar operation,
fails
: http://quviq.com/documentation/eqc/eqc.html#fails-1Here's an example:
and corresponding output:
I have gone for keeping changes to a minimum and thus leave the test-generation-shrink loop and
Test.check_cell
untouched. Negative tests are marked with abool
in their underlyingcell
- and it is instead the job of the runner to match this marker withTestResult.t
s.In the current design a test
Error
is considered a failure for both positive and negative tests (see pos-fail and neg-fail above).Things missing:
internal tests of the new functionalityextend the alcotest runnerextend the ounit runneradd CHANGELOG entryInput on the design and/or implementation are welcome.