emil-e / rapidcheck

QuickCheck clone for C++ with the goal of being simple to use with as little boilerplate as possible.
BSD 2-Clause "Simplified" License
980 stars 173 forks source link

Parallel testing #47

Open furuholm opened 8 years ago

furuholm commented 8 years ago

Parallel testing is supported by QuickCheck and PropEr and I think it will prove very useful for RapidCheck as well.

I have created a proof of concept[1] and a gist that shows an example of usage[2]. This proof of concept follows the design of the QuickCheck [3] and PropEr[4] counterparts where possible. Note that this is a prototype and nothing else!

At least one API change is required for this feature to be implemented and that is that the behavior of the run function has to be split into a execution step and a verification step. This can be done by adding a new function to the Command class or by letting the run function return a callable or a std::function that can be used for verification at a later point. Returning a function might be the preffered way since the run function otherwise typically will have to change the state of its Command object (This is what the proof of concept does. I have even made a field mutable in order to minimize the changes necessary in the prototype).

Shrinking works given that enough retries are made. The count of retries has to be configurable in a complete solution.

[1] https://github.com/furuholm/rapidcheck/tree/parallel_testing [2] https://gist.github.com/furuholm/bc98d94178c96b0f341a [3] Finding Race Conditions in Erlang with QuickCheck and PULSE - http://www.cse.chalmers.se/~nicsma/papers/finding-race-conditions.pdf [4] PropEr documentation of proper_statem module - http://proper.softlab.ntua.gr/doc/proper_statem.html

emil-e commented 8 years ago

I haven't looked closely yet but I have one question at least. The splitting of run/verify is to more easily provoke race conditions? As it is now, the run function would contain logic for verification as well which would make the execution of the actual logic more "dense".

furuholm commented 8 years ago

It's to delay verification until after the execution. All possible interleavings have to be tested which means that run is invoked once wheras verify (typically) is invoked many times...

On Sun, Jul 26, 2015 at 5:32 PM, emil-e notifications@github.com wrote:

I haven't looked closely yet but I have one question at least. The splitting of run/verify is to more easily provoke race conditions? As it is now, the run function would contain logic for verification as well which would make the execution of the actual logic more "dense".

Reply to this email directly or view it on GitHub: https://github.com/emil-e/rapidcheck/issues/47#issuecomment-125008494

emil-e commented 8 years ago

I've thought some more about this and I've understood the problem a bit better. I stumbled upon something that will likely be an issue:

Originally, commands had to always assert preconditions to ensure that they are not run for inappropriate states when shrinking. However, this was a bit unintuitive since the current state is passed to the generation function. This makes it is easy to assume that if you just return a generator that generates valid commands for that state, you'll be good even if you don't assert preconditions. If you understand the inner workings, you'd know that this command may be used for other states during shrinking but from the users perspective, it looks like he gave us a perfectly valid command but when it's used later, it's not valid anymore.

To prevent this, I ensured that commands were always regenerated with the same seed when the state before it changed. If the command was still valid, the exact same command would likely be generated but the behavior in general seemed more intuitive to me.

However, this causes a sorts of problems for parallel testing where there isn't just one state preceding a particular command, there are many possible ones. This breaks the guarantee that the state that was used to generate is the state that will be used to perform. It also means that there isn't really a way to determine if a reordering of commands is valid since the assertion of preconditions is likely missing for many commands if the user simply follows the established rules.

furuholm commented 8 years ago

I think what you are saying is that even if there exists a valid serial command sequence, there may be interleavings produced by a parallel sequence that will violate the preconditions. Right? This is solved by ensuring that all possible interleavings are valid sequences (it's a todo, but fairly simple to add). If this is not the case one could either discard the sequence or fallback to executing the serial command sequence (this is what PropEr does, i'm not sure about QuickCheck but I think that they have the same behavior).

Does that address your concerns?

emil-e commented 8 years ago

No, I am aware of this and my concern is one specific to how RapidCheck is implemented.

Consider the following command:

struct Remove : public state::Command<BagModel, Bag> {
  std::string item;

  Remove(const BagModel &s) {
    item = *gen::elementOf(s.items);
  }

  void apply(BagModel &s) const override {
    s.items.remove(item);
  }

  void run(const BagModel &s, Bag &sut) const override {
    sut.removeItem(item);
    RC_ASSERT(!sut.hasItem(item));
  }
};

We assume we are using rc::state::gen::execOneOf which means that the state passed to the constructor of the command is the pre-state.

In the original implementation, the above example would be ill-formed. item is guaranteed to be one of the elements of s.items but since the command may be applied to a state other than the one given to the constructor when shrinking, we would get unexpected behavior since it is assumed here that item simply exists among the items of both the model and the SUT. What is missing is of course an assertion that the item exists among the items.

However, this is a bit unintuitive if you're not familiar with the details of how this all works. We generated an item of the model, right? That item should exist in the model when the command is applied. To the user, it looks like everything is fine.

So I changed the implementation so that it is always guaranteed that the state passed to the constructor is always the state that the command is applied to removing the need to assert preconditions in apply if you simply generate valid stuff in the constructor, as in the example above.

This is fine in the sequential case but causes problems in the parallel case since we can no longer trust apply to either assert or succeed, it may actually just fail.

Now, this problem could of course be solved by simply saying "hey, in parallel tests, you must make sure you always assert ALL preconditions". The disadvantage here is that it makes parallel tests different from sequential tests by making the rules for them different.

This may or may not be a problem. It could turn out to be a non-issue since commands for a parallel test would likely work a bit differently from commands for a sequential test. But I don't have a clear picture in my head how to even generate parallel sequences yet so it's something that I would like a bit more clarity on.

In general, I'm worried about having to make tradeoffs in the simplicity of sequential testing for the benefit of parallel testing. The splitting of run into run and verify is a reasonable solution but it complicates the sequential case where such a thing is not necessary. Being able to intermix assertions and operations is very useful, at least in a language like C++. This is not a problem in Erlang QuickCheck since it verifies the return value of performing the operation, it doesn't allow for arbitrary assertions in the way that RapidCheck does if I understand correctly.

furuholm commented 8 years ago

I see your point. In the end this will boil down to what tradeoffs that you are willing to make in order to add support for parallel testing.

A couple of thoughts on how to minimize the impact.

Regarding the split of the run function: maybe this could be made a little bit better by letting the user make assertions in the run function in the serial case. The same assertion could generate an error in a parallel test case, alerting the user that the assertions has to be moved into the verify function. As a side note: the optimal solution for the parallel case would be to let the user return a function from run function that does the verification (as I mentioned in the first post) since this function could close over the data that the run function produces. It would not be good to force all run functions to return such a function however.

Regarding the issue with the constructor taking the pre-state model as argument: this seems like a useful feature in the serial case, but as you say this is simply not possible in the parallel case. As I see it providing a class with this constructor would have to generate a failure in the parallel case (runtime or compile time).

furuholm commented 8 years ago

After discussing this in person we decided on creating separate sub classes for commands that parallalizable and for those that are not. I have however experimented with another solution. I have added a second overload of run in the Command class

std::function<void(const ModelT&)> run(Sut &sut) const

This is a run function that only accepts the system under test and instead returns a function that can be used to verify the command at a later point. The default implementation of the original run function invokes the new run function like so

template <typename Model, typename Sut>
void Command<Model, Sut>::run(const Model &s0, Sut &sut) const {
  run(sut)(s0);
}

This way you can chose to implement either run function for sequential tests.

Back to the concerns that you had:

The code is available in my fork of RapidCheck.

furuholm commented 8 years ago

There are details left to be hammered out but I think this solution is more elegant than having separate command types and it doesn't complicate the sequential case too much. What do you think?

bibhas commented 7 years ago

Is there any update on this feature request? Would be a great feature to have.

furuholm commented 7 years ago

There is a pull request with support for parallell testing. Unfortunately there have been major refactorings of the code base after the version that the parallell testing PR was forked from which means that getting this PR into shape would probably be quite a lot of work. I'm not sure if and when this will happen.

Meanwhile: feel free to try the forked version. https://github.com/furuholm/rapidcheck/tree/parallel_testing The docs have been updated so if you know RapidCheck I think you will be able to get going using the docs. If you have any questions or issues, let me know and I'll see if I can help you.

Disclaimer: I'm not the author of RapidCheck