Playing the Applicative Game with Claims and Proofs

TheSeamau5 / elm-check

Property Based Testing in Elm

70 stars 20 forks source link

Playing the Applicative Game with Claims and Proofs #3

Closed TheSeamau5 closed 9 years ago

TheSeamau5 commented 9 years ago

So, I open this issue to not clobber https://github.com/TheSeamau5/elm-check/issues/2.

I've explored a bit the idea of using "claims" and "proofs" as the way of exposing a friendly DSL-y interface for the user a-la https://github.com/TheSeamau5/elm-check/pull/1 by @sgraf812

The idea is as follows. You make "claims" about your system like the fact that reversing a list twice should yield the original list.

In code, this would look something like this:

claim_reverse_involution =
  claim
    "Reversing a list twice yields the original list"
  `that`
    List.reverse >> List.reverse
  `equals`
    identity

Now, this is just a claim about any list. Unlike normally with elm-check, this doesn't get tied up to a particular type, which conceptually seperates the claim from the actual trying to prove or find a counter example.

So, suppose we believe that this "claim" should be true, we can tell elm-check to "prove" it for some given input space.

proof_reverse_involution =
  prove
    claim_reverse_involution
  `for`
    list int

Now, what we get is a proof, which is equivalent to the old Property type where this is just Int -> Seed -> TestResult

If instead you wish to "disprove" the claim, you can just use disprove instead of prove and that'll just flip the expected Bool value

counter_reverse_involution = 
  disprove
    claim_reverse_involution
  `for`
    list int

And, now, when you actually want to run these things, you can call check, which as a word makes sense in this context because you want to "check" proofs and "check" for counter examples.

check proof_reverse_induction

and this will give you the actual TestResult (modulo seeds and number of runs).

Any opinions, @evancz ? Does this terminology seem to make more sense in this context ?

sgraf812 commented 9 years ago

Looks great! Some thoughts:

I don't like that disprove flips the bool, after all it tries to disprove the same claim as before via a counter example, not to prove the negation (which logically is another thing).

Given that you can't really prove, only disprove with testing, semantically disprove makes more sense. I really like the DSL to map to theorem proving, but I'm sure if this is particularly enlightening to newcomers.

Also, splitting up into a claim (the actual property) and a proof (the claim fed with random values) might overcomplicate things and I have the feeling that the random generator and the claim should go together. Think of a claim that only covers certain input types, e.g. a list with strictly ascending elements. You either have to encode that in your claim (via an implication) or run on a custom generator for ascending lists. Encoding that in the claim means that most of the samples are true by default (most lists aren't strictly ascending) which means you cover a lot less from the search space you want to, which leaves us with tying the generator to the claim, which I don't think is a bad thing, since it kind of models an implication. Another example: Consider the claim

claim "15*15 is 225" `that` (\n -> n /= 15 || n*n == 225) `equals` (\n -> True)

You would have to run that on all ints, which would mostly not cover the case you want (n == 15).

A last thing I hit just now: I'm unsure if the split into a left hand side and a right hand side is sensible. I liked the old way where just stating a claim which evaluates to a Bool quite well. The split just overcomplicates.

TLDR I think the old way of stating boolean claims was the way to go. The fact that generators are tied to claims allows to narrow the input space of the claim like an implication. Also keep in mind, that quickcheck style testing is mostly about disproving a claim via a counter example, not the other way round. Other than that, I'm excited where this is going :)

TheSeamau5 commented 9 years ago

Thanks for the input. I like the idea of just trying to "disprove" a claim. It makes it more in line with the scientific method. For usability though, I think you want to do both ways in the boolean sense of the result. So, maybe that can be done by having both equals and notEquals. That'll allow us to simply deal in "disproving" rather than also "proving" which in reality is not tractable.

About the splitting up of the claim and the generator, I don't know if I agree that seperating them is a bad idea. I do see the point of the generators being tied to the actual claim (like you want to know something true of negative numbers so you don't want to start generating positive ones cuz they'll fail your tests needlessly).

I think that to solve that, I'll need to add in the idea of a "precondition" which is something you see in other QuickCheck ports.

Say you want to test that dividing by a number and then remultiplying by it yields the same result. This is true except if that number is zero. So, you'd add in a precondition to tell the generator and the shrinker to filter out zeros from the generation and shrinking process.

claim_division_multiplication_inverse = 
  claim
    "Dividing some number by a number then multiplying by that number yields the original number"
  `that`
    (\n d -> n / d * d)
  `equals`
    identity
  `provided`
    (\n d -> d /= 0)

counter_division_multiplication_inverse =
  disprove
    claim_division_multiplication_inverse
  `for`
    float . float

In this case, provided is telling elm-check that whenever it generates a 0 or shrinks a float to a 0 to just discard it.

Now, you can just add loads of these generators afterwards. In this case, floats are the only thing you can generate, but you can imagine that with other claims you want to test what happens with list of ints, and lists of strings and list of lists and, etc....

Finally, about the beginner stuff... Do you have any ideas or advice on how to make things more approachable? I think that this already improves a lot on the whole Property thing. Cuz, in most versions of QuickCheck, the words used there make you feel like what it's doing is prove that a property holds whereas in reality it's trying to disprove it. Like, the point of the thing is that when QuickCheck finds a bug you should feel glad like you've succeeded in finding the bug.

Also, as was pointed out here, words like Arbitrary and Property are confusing and they don't really tell you what's going on.

evancz commented 9 years ago

I'm not a huge fan of all the infix stuff going on. I guess this is relatively common in testing stuff, but I imagine the types of all these infixes are pretty hard to figure out and that you just have to see examples and copy them and hope nothing goes wrong.

I didn't think about this before, but if you give something that is (a -> Bool) I guess you can only say "this challenge has been disproved with input a" whereas in the version here you can say "this challenge has been disproved with input a because x does not equal y". Is that true?

If so, is that a big improvement in error quality?

TheSeamau5 commented 9 years ago

Well, for me, the biggest improvement is that I can generate assertions with this. The sort of assertions you get with elm-test. If I did this:

\n d -> n / d * d == n

I can generate n and d but I can't tell what I'm expecting to get.

But, now that I think of it, it is a big improvement in the error quality cuz you can get explicitly an expected vs actual.

Like I said in the mailing list, if push comes to shove, you can use elm-check to generate assertions which can then feed into your testing or continuous integration environment.

As for the infix stuff, the idea is that this should strictly be optional. Given what I see from surveying the testing world, testing frameworks tend to like these DSLs. I suspect it's because it's tedious to write tests and tests are super useful for documentation. Like, you can read unit tests and tell to a certain extent what your function does. Does that make any sense or is it still a bad idea? I'm lucid enough to recognize that the fact that something is common doesn't make it good but not lucid enough to actually recognize if it is a good idea or a bad idea.

sgraf812 commented 9 years ago

Regarding extracting the actual and expected values: You could just use a custom equals operator which records both values and name it =@= or something. I'm afraid so that you would probably also need to reimplement && et al.

This may get out of hand, but it's done like that in sbv. Basically there is a symbolic data type for each primitive data type like Bool or Int which enables the 'introspective' properties of the SMT solver.

I'm also concerned that the types involved won't make it easy for beginners to fix type errors. I'm not sure can be done about that except responding to SO questions and providing examples.

TheSeamau5 commented 9 years ago

Yeah, I don't think I'd go so far as to change boolean operators. I am willing though to rename equals to is cuz that can make your example look like:

claim "15*15 is 225" 
  `that` 
    (\n -> n /= 15 || n*n == 225) 
  `is` 
    always True

You use always then a bool when you want to have multiple conditional tests seperated by ors and ands. No need to change any boolean operators or whatever.

And then you can pass in an Arbitrary Float or whatever I end up calling it:

disprove my_claim `forall` float

(I'm also testing out forall instead of just for)

So, a bigger example synthesizing these ideas and bring back the generators with the claims as you suggested would look like:

claim_reverse_append =
  claim
    "Appending then reversing lists is equivalent to reversing then appending"
  `that`
    (\l1 l2 -> List.reverse (l1 ++ l2))
  `is`
    (\l1 l2 -> List.reverse l1 ++ List.reverse l2)
  `forall`
    list int . list int

sgraf812 commented 9 years ago

I don't like how the description is "15*15 is 225" and the code reads different, but YMMV, so I can live with that. Just so that we are on the same page:

claim_reverse_append =
  claim
    "Appending then reversing lists is equivalent to reversing then appending"
  `that` 
    (\l1 l2 -> List.reverse (l1 ++ l2) .== List.reverse l2 ++ List.reverse l1)
  `for`
    list int . list int

Ain't so bad IMO.

I'm definitely against forall since it's quite possible this is considered for a keyword later on in language evolution.

TheSeamau5 commented 9 years ago

So, I just got a bit of a reality check when I worked on the DSL. Turns out that it needs slight tweaks to be able to work with the type system. Basically, if you have multi-arity functions, you need to work with tuples explicitly for it to work, which is not that big of a deal, to be honest.

This is how the example would look:

claim_reverse_append =   
  claim 
    "Appending then reversing lists is equivalent to reversing then appending"
  `that`
    (\(l1, l2) -> List.reverse (l1 ++ l2))
  `is`
    (\(l1, l2) -> List.reverse l1 ++ List.reverse l2)
  `for`
    tuple (list int, list int)

Good news is that I can even offer this:

claim_length_list_positive = 
  claim
    "Length of a list is always positive"
  `true`
    (\list -> List.length list > 0)
  `for`
    list int

So, you can claim things to be true or false pretty easily. And, claim is the same exact claim function as defined in Check.elm. There's no fancy builder business. It's just, straight up interspersing the code with words that just pass on the next thing to the previous thing. This means the DSL is pretty transparent and is just there to help readability.

TheSeamau5 commented 9 years ago

Added in 0.15