WIP: Generalize shrinking

edsko commented 5 years ago

See #262 for discussion.

@stevana This is a WIP PR. This implements the generalization we previously discussed, but only for the sequential case. Submitting it as a PR now so you can take a look to see what you think. I've not yet tried the test suite (since that will require implementing the parallel case also), but I have tested it in my real project and it works like a charm. As you can see in the PR, the update to the code is actually not that big either, which is good.

I think tomorrow I will try to finish my own test case now that this is working and available before trying to update the parallel case. I've not used parallel testing very much yet so might be a little slower.

stevana commented 5 years ago

Yep, looks good!

(If you can think of a way to include some tests of shrinking working along the way that would be really useful. The bug you found before was a regression actually... We used to have some tests for shrinking, but those tests were annoying to maintain so I deleted them during a refactor...)

Regarding parallel testing: that's actually the cool part, :-). Feel free to ask, if anything isn't clear!

edsko commented 5 years ago

@stevana Ok, just pushed another commit that deals with the parallel case. Please take a careful look. I think the modifications in the end are similar as for the sequential case, and I think apart from the shrinking itself, it should not change any of the current behaviour. The test suite passes except that for the doc test I'm getting

# quickcheck-state-machine-test -p Doctest
Tests
  Doctest Z module: FAIL
    Exception: quickcheck-state-machine-test: can't find a package database at /home/edsko/wt/clients/iohk/ouroboros-network/dist-newstyle/packagedb/ghc-8.2.2

which seems completely unrelated to my patch?

A comment on the implementation: the sequential case split the shrinking of a list of commands from the shrinking of the commands themselves (which needs the model). For the parallel case we need to do the same, except that we now have more structure: a prefix, then a list of pairs of list of commands. To keep this somewhat tractible, I introduced some machinery in the .Util module to shrink a value and record whether something changed. This then makes the rest of the implementation fairly straight-forward.

Don't merge this yet, I will try to also write a property or two about shrinking. However, I'd be curious to hear your review of my changes and whether you think this is OK.

edsko commented 5 years ago

I may have messed something up... the shrinking test that I'm writing is stress testing references, and it's failing somewhere. Don't know yet if my test is broken or the code. Will push the test as part of this commit also, of course.

edsko commented 5 years ago

@stevana Ok, I see my mistake. I don't call shrinkAndValidate on the prefix. Will fix. Have quite a nice property now that shrinking must satisfy, I think you'll like it :) Fix coming.

edsko commented 5 years ago

Uh, no, sorry, that comment is wrong, and makes no sense. But shrinking is going wrong and I do have a simple counterexample. I'll keep digging, sorry for misleading.

edsko commented 5 years ago

@stevana Ok, I think I know what's going on, but I'm out of time for today. I will continue tomorrow. What I think is happening however is that the "post processing" in shrinkAndValidateParallel is assuming that the prefix doesn't change anymore, and hence the prefix of the left and right must be the same. This is not true anymore in the new setup, so I need to generalize this a bit. WIll try to fix this tomorrow.

Meanwhile, I pushed the test I mentioned to you. I'm quite pleased with it; see the commit message at https://github.com/advancedtelematic/quickcheck-state-machine/pull/263/commits/dfe5985a944c41090444025483fb173a91dc7338 for a detailed description.

edsko commented 5 years ago

(Incidentally, the model I set up in that test follows the structure I will advocate in my blog post also :)

stevana commented 5 years ago

Only had a quick look so far, but it looks neat! :-)

By the way, one more property of shrinking that might be worth testing is that: any result of shrinking should be possible to generate. (This is kind of what Hedgehog-style shrinking gets for free).

edsko commented 5 years ago

By the way, one more property of shrinking that might be worth testing is that: any result of shrinking should be possible to generate. (This is kind of what Hedgehog-style shrinking gets for free).

This is not true in general for quickcheck-state-machine. It is entirely possible to write combinations of generators and shrinkers such that the shrinker can shrink to something that the generator will never generate. In fact, my "real" project in which I'm applying all this stuff is an example of that.

What I was thinking though is that it would be good to test that the model that the shrinker is given is the same one that you'd get if you then walk over the shrunk test case step by step, updating the model as you go. I think I have a way to do this but I need to ponder it a bit.

stevana commented 5 years ago

This is not true in general for quickcheck-state-machine. It is entirely possible to write combinations of generators and shrinkers such that the shrinker can shrink to something that the generator will never generate. In fact, my "real" project in which I'm applying all this stuff is an example of that.

Possible yes, but is it ever desirable? Intuitively this feels like a mismatch between generators and pre-conditions to me...

What I was thinking though is that it would be good to test that the model that the shrinker is given is the same one that you'd get if you then walk over the shrunk test case step by step, updating the model as you go. I think I have a way to do this but I need to ponder it a bit.

Yeah, would be nice to test as well.

edsko commented 5 years ago

Possible yes, but is it ever desirable? Intuitively this feels like a mismatch between generators and pre-conditions to me...

Well, you can let me know what you think after I write my blog post :)

But even for the simple model that I'm now using to test shrinking, it's not true: commands are given identifiers, which are generated as increasing integers. But after shrinking those integers won't be consecutive anymore.

So at the very least it depends on the model, so we could only test this for a given model. Having said that, for the model that I'm working with to test shrinking, we could test this "up to command IDs", but I think it would be prohibitively expensive: you'd have to say, "given this shrunk set of commands, find a seed such that..". I don't think it would be feasible? :thinking:

edsko commented 5 years ago

It's an interesting comment though. I feel it's actually an advantage that we can make this distinction (that this property is not true), and if that is not the case in hedgehog, maybe this is one plus point for the QC approach. But I need to think more about it :)

stevana commented 5 years ago

Well, you can let me know what you think after I write my blog post :)

Yeah :-)

So at the very least it depends on the model, so we could only test this for a given model.

Perhaps this could be a meta-property that the library provides the user to sanity check their generators and pre-conditions?

it would be prohibitively expensive: you'd have to say, "given this shrunk set of commands, find a seed such that..". I don't think it would be feasible?

Not with the current way of generating, but it would be easier with the Markov chain approach:

https://github.com/advancedtelematic/quickcheck-state-machine/pull/242/files#diff-cb1f5b1b1e7e0a159c0e9565f33ffb75R232

Start with the first command name, make sure it's reachable from the initial state of the chain, etc. Not sure if one can effectively check the arguments of the commands though...

Sorry, this got a bit off topic.

edsko commented 5 years ago

It's an interesting discussion for sure! Perhaps we can consider this to be separate from this specific PR? I think I know how to fix shrinking, and I'll extend my test also to check the model, and then I think we might be in good shape. Do you think it warrants another release?

stevana commented 5 years ago

Perhaps we can consider this to be separate from this specific PR?

Yes, I'll open a separate issue.

Do you think it warrants another release?

Sure!

edsko commented 5 years ago

@stevana Quick question for you: advanceModel returns a (model Symbolic, Counter) pair. Do we have the property that the (model Symbolic, Map Var Var, Counter) that we get from validation is the same model and counter?

stevana commented 5 years ago

I've not checked this, but I would hope so, as both advanceModel and validation use transition and mock...

edsko commented 5 years ago

I've not checked this, but I would hope so, as both advanceModel and validation use transition and mock...

Yeah, ok. I thought so too. Ok, in my new shrinker this means we can skip one call to advanceModel. I'm inching closer to having it working, but not quite there yet :)

edsko commented 5 years ago

Oh, wait, my test is failing because it actually has an race condition, and quickcheck-state-machine found it! Amazing :grinning:

edsko commented 5 years ago

@stevana Ok, I think the shrinker is now correct. I've squashed my commits and signed off. A few remarks:

The original validateCommands lived in State; in my original patch I changed this to Reader; since I now want to return the value also (but not in State, because it may return many values), the environment is now a simple argument.
I hope the rest of the code is self explanatory; we have to be careful in how we do things, especially in the parallel case, so I introduced some custom datatypes and lots of comments to explain some finer points. Let me know if anything is unclear.
I changed my "did we mess up references" test so that instead of producing a new program, we produce a graph (command X referred to the Nth output of command Y). This was necessary because I didn't realize that parallel shrinking can reorder commands (this may happen when moving commands back from the parallel suffixes to the sequential prefix).
I also test that the model that the shrinker is presented with is the right one. I do this by having the shrinker record the model as part of each command during shrinking, and then verify that when we walk over all commands in order, we get precisely the models that the shrinker recorded.

HOWEVER, I am getting a test failure in my "did we mess up the references test", but it seems that this is actually a bug in the generator of the parallel commands, not in the shrinker. So it seems this test is bringing out a latent bug perhaps? But I need to look a bit deeper.

edsko commented 5 years ago

@stevana I am away for the next two days, but @mrBliss will take over this branch. I am pretty confident the shrinker is now correct, but I just discussed the error with @mrBliss and we both think it seems that there is a bug in generateParallelCommands. He is going to take a look at that (the property in my PR here can help to find it), and hopefully push a fix here.

stevana commented 5 years ago

Alright, cool!

mrBliss commented 5 years ago

I have spent some time trying to understand the generation of parallel commands. I think now that the bug is not there, but actually in the test, due to a misunderstanding of how these parallel commands are generated.

@stevana Maybe squash these commits before merging.

stevana commented 5 years ago

I think now that the bug is not there, but actually in the test [...]

Did you manage to fix the bug also? Is this PR ready for review now?

mrBliss commented 5 years ago

Did you manage to fix the bug also? Is this PR ready for review now?

Yes, I fixed the test (see b3218b7). It is ready for review now.

stevana commented 5 years ago

Cool!

There still seem to be some simple problems with building as witnessed by Travis... I'll try to comment on the rest of the code during the day.

mrBliss commented 5 years ago

Interesting, one CI build is failing because one of the new tests fails, which means there is a problem with parallel shrinking (or possibly with the test itself). I will investigate.

edsko commented 5 years ago

Alright, I'm back :sunglasses: @mrBliss and I discussed what still needs fixing, and I'll try to get this to mergeable state. @stevana will ping you when it's ready :)

edsko commented 5 years ago

(EDIT: Fixed the validation example)

@stevana Ok, tests are passing (I ran with --quickcheck-tests 1000, I'll run it in the background with 10000 tonight), and I think the code is correct now. Here's an overview of the final changes:

`Command` now records the response

I've changed Command to

-- | Previously symbolically executed command
--
-- Invariant: the variables must be the variables in the response.
data Command cmd resp = Command !(cmd Symbolic) !(resp Symbolic) ![Var]

Before we were only recording those variables, and not the response itself. However, every time we create a Command, we have that response; we just throw it away. Keeping it has an important advantage: when we apply a previously generated Command to a model, we don't need the mock function anymore. This means that advanceModel changes to

advanceModel :: StateMachine model cmd m resp
             -> model Symbolic      -- ^ The model.
             -> Commands cmd resp   -- ^ The commands.
             -> model Symbolic
advanceModel StateMachine { transition } model0 =
  go model0 . unCommands
  where
    go model []                              = model
    go model (Command cmd resp _vars : cmds) =
        go (transition model cmd resp) cmds

where we don't need the counter anymore. Not only does this make the code a little cleaner, it also removes some head-scratching: in the old advanceModel, we were ignoring the vars associated with the command, creating a new response, and then continuing with cmds... but what if those vars were different from the ones we generated before, and there are commands in cmds that referred to those old variables? If we are careful that we apply this generation in precisely the same way we should be okay, but this is rather brittle. By recording the response this whole problem goes away.

The validator for parallel commands interleaves left and right parts of the suffixes

The old validator was applying the sequential validator to

prefix <> mconcat allLeftSuffixes
prefix <> mconcat allRightSuffixes

basically. But this is not correct. A parallel program such as

[ create reference 0 || create reference 1 ]
[ use reference 1    || ..                 ]

is perfectly correct: at the start of the second parallel suffix, we can assume all commands (both left and right) have been executed. Initially I thought that this was not a valid program, which is why I thought above that the generator for parallel commands was broken, but that was a misunderstanding on my part. My parallel shrinker however made this same assumption as the old parallel shrinker did, which is the bug that @mrBliss identified. This also means that the old validator (before this PR) was throwing away too many programs: basically it required that left and right suffixes were completely independent.

The right thing to do is validate in this manner:

modelAfterPrefix <- validate initModel prefix

modelAfterLeft1  <- validate modelAfterPrefix left1
_                <- validate modelAfterPrefix right1
let modelAfterSuffix1 = advance modelAfterLeft1 right1

modelAfterLeft2  <- validate modelAfterSuffix1 left2
_                <- validate modelAfterSuffix1 right2
let modelAfterSuffix2 = advance modelAfterLeft2 right2

..

which is what this PR implements.

This has the additional benefit that we don't need to have the final parallelSafeMany check: since we are precise in the models that we use during parallel validation, this final check becomes superfluous.

(Bonus points: we don't need the dependency on the split package anymore :)

Other consequences

@mrBliss already updated the shrinking properties to reflect this better understanding of how parallel commands work, but the "do we get the correct model" test also needed to be updated, which I've done.

Finally, the change to Command has some consequences for user-land code also; previously it was difficult to define

execCmd :: Model Symbolic -> QSM.Command Cmd Resp -> Model Symbolic

but now it's easy: since the command carries its response, this is just a call to transition.

edsko commented 5 years ago

It took a little over 2 hours but 10,000 tests passed successfully.

edsko commented 5 years ago

@stevana I think this is ready :)

edsko commented 5 years ago

Awesome! :+1: :grin:

advancedtelematic / quickcheck-state-machine