Fuzzing update functions?

rtfeldman commented 7 years ago

QuickCheck co-creator John Hughes gave a sweet talk about using QuickCheck to test programs. What he describes is essentially "generate a random list of messages, fold them over update, and run an assertion on the output of update each step of the way."

This lets you test invariants about your model which must hold no matter what messages you receive.

It turns out this is pretty easy to write, so I did it on a branch. (A PR would definitely be premature at this point, but if anyone wants to try this out, they can replace the case-expression in the implementation with Test.Runner.getFailure.)

There are some open questions here:

Is this a good idea?
If so, is this the right API for it?
If so, is this worth adding to elm-community/elm-test as opposed to a third-party library?
If so, should we give it custom Message handling so runners can give nicer output?

Thoughts?

mgold commented 7 years ago

It's an interesting idea.

I know some people have had issues importing their main module for testing. It sounds like you'd have to do that to use this test, or otherwise move update to another module which sounds weird.
I think you'd want to take in Fuzzer msg instead of a List msg so that all the seed distribution and whatnot is handled just like any other fuzz test. The library code should (1) verify the initial model passes the test and then (2) generate nonempty lists of messages to apply.
Should we assume that people's update functions produce a command to discard? If we assume they do not, we can support people who use the simple program runner, and those who do should pass update >> Tuple.first. As counterpoint, anyone serious enough to use this is going to be using a command.
If it's possible to do everything we want to do using the public API, we should start this as a separate package, mention it in the README Strategies section, and then see how much it takes.
Another possible use case is, I'm generating most but not all of my messages, and I expect a certain field never to change because the message that changes it is never sent.

rtfeldman commented 7 years ago

Is this a good idea?

I think so, yes! I tried it out and it immediately revealed that I wanted SetHue to enforce that the minimum model.hue value was 0.

Is this the right API for it?

I'm not sure. Here's what it is right now:

testUpdate :
    (msg -> model -> ( model, cmd ))
    -> model
    -> (model -> Expectation)
    -> List msg
    -> Expectation

I could see dropping the assumption that it returns ( model, cmd ) and instead just expect it to return model, like this:

testUpdate :
    (msg -> model -> model)
    -> model
    -> (model -> Expectation)
    -> List msg
    -> Expectation

This would mean instead of calling it like so:

        fuzz (list msgFuzzer) "hue, ripple, and noise are never negative" <|
            testUpdate Page.update Page.initialModel <|
                Expect.all
                    [ .hue >> Expect.atLeast 0
                    , .ripple >> Expect.atLeast 0
                    , .noise >> Expect.atLeast 0
                    ]

...you'd call it with update >> Tuple.first like this:

        fuzz (list msgFuzzer) "hue, ripple, and noise are never negative" <|
            testUpdate (Page.update >> Tuple.first) Page.initialModel <|
                Expect.all
                    [ .hue >> Expect.atLeast 0
                    , .ripple >> Expect.atLeast 0
                    , .noise >> Expect.atLeast 0
                    ]

The disadvantage is that you have the extra call to Tuple.first all over the place...although that's less true if you do it with an anonymous function and a pattern match:

        fuzz (list msgFuzzer) "hue, ripple, and noise are never negative" <|
            testUpdate Page.update Page.initialModel <|
                \( model, _ ) ->
                    Expect.all
                        [ .hue >> Expect.atLeast 0
                        , .ripple >> Expect.atLeast 0
                        , .noise >> Expect.atLeast 0
                        ]
                        model

Advantages to this way:

The type annotation is simpler
It's more straightforward to use for the less common update APIs, where you need to return more than just ( model, Cmd msg ), for example:

        fuzz (list msgFuzzer) "hue, ripple, and noise are never negative" <|
            testUpdate (Page.update >> Tuple.first) Page.initialModel <|
                \( model, _, someOtherValue ) ->
                    Expect.all
                        [ .hue >> Expect.atLeast 0
                        , .ripple >> Expect.atLeast 0
                        , .noise >> Expect.atLeast 0
                        ]
                        model

is this worth adding to elm-community/elm-test as opposed to a third-party library?

Assuming it's a good idea at all, then yes. It doesn't have any dependencies outside of elm-community/elm-test anyway, and it's applicable to basically every Elm program.

Should we give it custom Message handling so runners can give nicer output?

Long-term, probably - since you want to give a good bit of detail on the output, and diffing would be super helpful on nontrivial models - but doesn't seem like a release blocker.

rtfeldman commented 7 years ago

I know some people have had issues importing their main module for testing. It sounds like you'd have to do that to use this test, or otherwise move update to another module which sounds weird.

I'm curious to know more about that. Do you know what their specific problems were?

I think you'd want to take in Fuzzer msg instead of a List msg so that all the seed distribution and whatnot is handled just like any other fuzz test. The library code should (1) verify the initial model passes the test and then (2) generate nonempty lists of messages to apply.

Yep, great idea! Totally agree.

Should we assume that people's update functions produce a command to discard? If we assume they do not, we can support people who use the simple program runner, and those who do should pass update >> Tuple.first. As counterpoint, anyone serious enough to use this is going to be using a command.

Yeah, I have similar thoughts. The counter-counterpoint is that in certain rare cases, you want to return more than model and command, and it would be undesirable to make that harder.

If it's possible to do everything we want to do using the public API, we should start this as a separate package, mention it in the README Strategies section, and then see how much it takes.

I like it! Seems like a good approach.

Another possible use case is, I'm generating most but not all of my messages, and I expect a certain field never to change because the message that changes it is never sent.

Ah, so like "only these messages should ever change this field" and no other messages should touch it?

rtfeldman commented 7 years ago

I made a repo and pushed an (unpublished) draft to it: https://github.com/rtfeldman/test-update/blob/master/src/Test/Update.elm

I did one thing differently:

The library code should (1) verify the initial model passes the test and then (2) generate nonempty lists of messages to apply.

I stuck with the list msgFuzzer approach because of Given. This way, you get consistent output, e.g. Given [] if the base case failed, and Given [FooMsg] if it failed after processing FooMsg. To me that seems more valuable than the performance optimization.

Janiczek commented 7 years ago

For what it's worth, I tried this too a few months ago (usage) and will have a talk (draft) on this topic in ElmEurope.

Is this a good idea?

Yes, I very much think so :) TEA is suited for this kind of thing, and this would give you focused fuzz tests that don't go all over the place but focus on the interesting scenarios.

If so, is this the right API for it?

Your take on it seems very straightforward, I like it. (I wrote mine as sort of exploratory coding, before I even had my thoughts sorted for the talk, so I'd like to rewrite mine to be a bit cleaner.) I wonder whether the more focused functions (invariantTest, msgTest, msgTestWithPrecondigion) wouldn't fit in your API too (and whether there are more scenarios like these three worth specializing for)

If so, is this worth adding to elm-community/elm-test as opposed to a third-party library?

I think this is on par with *-extra packages, I wouldn't add this to elm-test.

Another possible use case is, I'm generating most but not all of my messages, and I expect a certain field never to change because the message that changes it is never sent.

This can be achieved with a Msg fuzzer not containing this particular Msg, right?

Janiczek commented 7 years ago

What is my open question for this topic, is: Erlang's QuickCheck has tons of hooks (precondition, postcondition) for both during command generation and command execution.

Do we need something like this, or is it only needed because of Erlang's dynamic nature (and the possibility of testing foreign code etc.)?

(See figure 2.4 on page 28 here http://www.diva-portal.org/smash/get/diva2:343744/FULLTEXT01.pdf )

mgold commented 7 years ago

This can be achieved with a Msg fuzzer not containing this particular Msg, right?

Exactly.

Do we need [hooks]?

I don't think so. We're not running any commands, and update should be able to handle any value of type Msg in any order and be sensible.

This way, you get consistent output, e.g. Given [] if the base case failed

i.e. if the provided model fails. Makes sense.

Janiczek commented 7 years ago

@mgold I tried implementing the orthogonality test (only certain Msg touches certain model part) and it seems hard to do "Msg fuzzer not containing this particular Msg" - the user would have to supply the final fuzzer. There is no easy way to remove a Msg fuzzer from a list of fuzzers, apart from having a predicate Msg -> Bool.

In the end, I went the way of letting the Msg fuzzer untouched and having a predicate to ensure the Msg we test is the Msg we want to test (generating Msgs until the predicate returns True, and using that Msg for the test).

zkessin commented 7 years ago

You might want to have a way so that the thing making the next step can depend on what came before.

If you have an event that returns a get time command the time reponse should be able to come after that. Take a look at this erlang code which implements something similar https://github.com/triqng/triq/blob/master/src/triq_statem.erl

It has 5 functions...

-type state() :: term().
-type command() :: {call, atom(), atom(), [_]}.
-callback(initial_state() ->
                  state()).
-callback(command(state()) ->
                command()).
-callback(precondition(state(),command()) ->
                 boolean()).
-callback(postcondition(state(),command(), term()) ->
                 boolean()).
-callback(next_state(state(),term(), command()) ->
                 term()).

Or in Elm Terms

   initState : State
   command: State -> Fuzzer Msg
   precondition: Msg -> State -> Expectation 
   postcondition: Msg -> State > Expection
   next_state: Msg -> State -> State

As you can see the command makes a fuzzer that makes one message but does so in a way that depends on the current state.

zkessin commented 7 years ago

@johnhughes would you like to comment on this?

Janiczek commented 7 years ago

@zkessin It would probably not be State -> Fuzzer Msg but List Msg -> Fuzzer Msg, because in Elm the tests don't have their own state machine that models the app (ie. you don't have shouldTimeMsgCome or askedForTime), instead the tests run the app directly.

I can't see how the Msg fuzzers would look like then. Could you maybe try and describe a test, in prose?

Tangent: Erlang's QuickCheck also does have a way of testing values across Msgs (ie. queue::put(val1) (symbolic) and then val2 = queue::get() and expect val2 == val1, but that seems hard to do with Elm. Or at least I got lost quickly in that symbolic world, far from even sketching an API :)

zkessin commented 7 years ago

I think we need to have some sort of model state to compare things to. The problem with something like List Msg -> Fuzzer Msg is that sometimes you want the next message to depend on the prior one.

For example, if you are building a key value store then you might have 2 msgs Get key and Put key value. But if your test does not know what values have been stored it can't try to get them. I will try to find some erlang examples that I can share later tonight or tomorrow

zkessin commented 7 years ago

I should also say that this type of operation might be useful for things other that update functions, again testing something like Dict this way could be very useful

Janiczek commented 7 years ago

Ah, I understand. So this would open the possibility to test more of the "state space":

tests

ie. if we wanted to test long-term interactions of Dict with our API, we would have to create a dummy update and Msgs to go with that (and even then that may not be enough, because of the set k v, get k == v not being expressible in that framework)

Janiczek commented 7 years ago

@zkessin I've come to your conclusion while trying to test elm-todomvc:

Fuzzers of type Model -> Fuzzer Msg [1] would be useful for example in the Delete Int Msg - if you just generated random IDs, most of the time you'd try to delete entities that aren't there. But with Model -> Fuzzer Msg you can generate just such IDs that are in the model right now.

[1] I'm speculating whether Model -> Maybe (Fuzzer Msg) would be better (to not force users of the library to have NoOp Msg, eg. in the Delete Int case when there's no entities in the model)

rtfeldman commented 7 years ago

@Janiczek I got this to type-check (without shrinking) on https://github.com/elm-community/elm-test/pull/110 - want to try it out and see if it's useful in your case?

Janiczek commented 7 years ago

@rtfeldman Maybe I can use that! Will try later though. Yesterday I've tried to implement the Model -> Fuzzer Msg type of fuzzers, and when it gets to shrinking, the JS runtime runs out of memory. So few Fuzz.andThens in succession clearly won't be the right thing to do here :)

avh4 commented 7 years ago

Not specifically related to testing update functions, but we had a discussion about the state-machine testing with QuickCheck that John Hughes talks about, and the result of our experiments is here: https://github.com/avh4/elm-newcheck

Note that in Erlang quickcheck, instead of having the current state produce a list of possible actions, it appears to produce a list of actions up front, and then each has a precondition that is evaluated to determine whether it should be applied or skipped, all of which seems to greatly reduce the complexity required of the shrinker (and in elm-test, would avoid the problem of the performance of Fuzz.andThen).

Janiczek commented 7 years ago

Do you all think it would be good to have a "summary" blogpost / page / ... comparing these approaches on various codebases - SPAs, libraries (data structures etc.)? ie. {rtfeldman/test-update, janiczek/elm-architecture-test, jwoudenberg/test-update, avh4/elm-newcheck} x {elm-spa-example, elm-todomvc, circular-buffer, vending-machine} or something along those lines.

EDIT: motivation. This part of the ecosystem is very much in flux, and there are pros/cons for all of these libraries, but right now we only suspect what the advantages might be, and I feel we'll only gain experience by writing tests with these. So, let's try these libraries on different projects and evaluate.

EDIT 2: I have started something (no code yet): https://janiczek.github.io/update-testing-apis/

elm-community / elm-test

Fuzzing update functions? #154