advancedtelematic / quickcheck-state-machine

Test monadic programs using state machine based models
Other
204 stars 25 forks source link

Improve garbage-collection/cleaning-up #335

Open kderme opened 5 years ago

kderme commented 5 years ago

I think this is a very common issue many users face: Since q-s-m is used to test monadic code, it's possible that there is some interference between consecutive tests, or betweeen shrinking trials and proper cleaning up of resources is hard.

I faced this issue mainly on my rqlite tests https://github.com/advancedtelematic/quickcheck-state-machine/tree/rqlite, where I spawn new unix processes which listen on ports. The solution I found there was to use the final Model returned from runCommands and do the cleaning up using this. For the parallel case, I extended the runParallelCommands, so that it also returns the final Environment and do the cleanup. It works, but if the re is an unexpected exception the cleanup won't happen. That's why I have disabled shrinking for now.

Another idea, is to get a cleanup function from the user and install handlers whenever suitable. Not sure how feasible this is and how big of a breaking change.

mrBliss commented 5 years ago

I have wanted this too (while working on https://github.com/input-output-hk/ouroboros-network/).

@edsko and I have come up with the following workaround: add a new command, e.g., Reset, that does the clean-up for the model in transition and the clean-up for the real implementation in semantics. Don't generate this command (so that the shrinker can never get its hands on it), but just append it to the list of parallel commands (something like: forAllParallelCommands sm $ \cmds -> monadicIO $ runParallelCommandsNTimes 100 sm (snoc cmds Reset)).

stevana commented 5 years ago

I've used pkill in some tests, it's ugly but got the job done...

In older versions of the library, setup and cleanup functions used to be part of the StateMachine record. I removed them in order to try to simplify things, but it could very well be that cleanup needs to be part of that interface...

kderme commented 4 years ago

I think asking for a user function to do the cleanup is a good idea. During execution we should be careful to evaluate everything with proper exception handles (similar to what QuickCheck does http://hackage.haskell.org/package/QuickCheck-2.13.1/docs/src/Test.QuickCheck.Exception.html#tryEvaluate) and in case of exceptions we should do the cleanup before propagating. This is possible in all executeXX commands, since they live in IO, but not possible for all shrinkXX and generateXX commands. That's not an issue though, since there should be no running user recourses while shrinking or generating, User interrupts and async exceptions should also trigger the cleanup process before being propagated.

I think the type of the user function should be something like Environment -> IO (). An alternative idea, having Model Conrete -> IO () is not that good, since in parallel execution the current Model is unreliable (but the environment is still reliable). What I'm mostly concerned is whether the Environment has enough information for the user to do the proper cleanup. In my example of rqlite it is enough, but not sure if it's enough all cases.

stevana commented 4 years ago

not that good, since in parallel execution the current Model is unreliable (but the environment is still reliable)

Why is the model unreliable? Can we not make it:

cleanup :: { prefix :: Model Concrete, suffixes :: [Model Concrete] } -> IO ()

?

I'm a bit hesitant to expose Environment to the user. It feels like an internal type...

kderme commented 4 years ago

In the parallel case executeCommands is called with the initModel as intial state. Inside executeCommands there is no precondition check and the final Model (result of transition) is usually just an unevaluated exception or something completely wrong.

(reason, (env', _, _, _)) <- (runStateT (executeCommands sm hchan (Pid i) check cmds) (env, initModel, newCounter, initModel))

I replaced this with

reason, (env', _, _, cmodel)) <- (runStateT (executeCommands sm... -same-
liftIO $ evaluate cmodel

and there were many failures. The issue is that there is no way we can know the real Model after a suffix is executed, so whatever result executeCommands returns is unreliable. What we know however is the Environment, or the references that were created (the random order does not affect this).

stevana commented 4 years ago

In the parallel case executeCommands is called with the initModel as intial state.

Yes, but that would be easy to change: pass in the model that's the result of executing the prefix. The problem to me seems to be that after executing one batch of suffixes, we cannot combine the resulting models and continue executing the next batch with the combined model. Unlike environments which have a monoid structure.

Or is there some other problem you're seeing?

kderme commented 4 years ago

I think it's the same problem. What you mention

pass in the model that's the result of executing the prefix

can only work for the first suffix. Next suffixes can't get a correct inital Model, because as you mention we can;t combine Models.

stevana commented 4 years ago

So another option would be to require Monoid (model r). Also a bit awkward...

kderme commented 4 years ago

Yes I don't think this will work either. Assume in MemoryReference, one threads runs Increment ref and the other Write ref 2. In the one Model the reference would become 1, while in the other 2 and there is no proper way to combine them.

Another option is to somehow have the user keep an environment which is Monoidal and make sure that this transitions correctly even in parallel tests. Something like it was done here https://github.com/input-output-hk/ouroboros-network/blob/master/ouroboros-consensus/test-util/Test/Util/RefEnv.hs#L35 and the model uses it in this way https://github.com/input-output-hk/ouroboros-network/blob/master/ouroboros-consensus/test-storage/Test/Ouroboros/Storage/ImmutableDB/StateMachine.hs#L301. Then this part of the Model transitions by a simple union.

stevana commented 4 years ago

In the one Model the reference would become 1, while in the other 2 and there is no proper way to combine them.

True, that's really ugly.

How about getting the references from the history? Our run functions return histories already... Or build a model from the histories and then pass it to the cleanup function?

kderme commented 4 years ago

Indeed, I think History would also work and is a good idea. Building the general Model seems hard though.

stevana commented 4 years ago

Building the general Model seems hard though.

Why so?

The history contains all commands and their responses, using the initial model and the transition functions should let us build a model similar to how linearise does it.

This model might not be consistent unless the history linearises, but this shouldn't matter for the cleanup (we only care that all references are there).

Likewise in case of an exception (and a missing response) the reference has probably not been created and hence doesn't need to be cleaned up. It's up to the user to clean up inside the semantics function in this case.

kderme commented 4 years ago

linearise stops if there is no postcondition that can hold, so any references created after that will not be visible on the Model. Do you think we can overcome this?

stevana commented 4 years ago

I'm not saying we should use linearise directly, only use it as inspiration for how we can traverse the history and rebuild a model.

stevana commented 4 years ago

interleavings can perhaps be reused directly though!

stevana commented 4 years ago

Any of the possible interleavings should be good enough to rebuild the model, I think.

kderme commented 4 years ago

Indeed, I wrote a function which does this and seems to work for my rqlite example. I was wondering though if it's too complicated for a recourse cleaning process. Do we have the guarantee that it will never fail, even after a test failure and for a random path of the interleavings tree?

Maybe the cleanup, user-defined function should be History -> IO () and provide the user with the utility function History -> Model Concrete (this function will also need the transition and the initModel to work), but not use it by default. So now it's the user's responsibility to have transition never fail if he wants to use this utility. Thoughts on this?

stevana commented 4 years ago

Maybe the cleanup, user-defined function should be History -> IO () and provide the user with the utility function History -> Model Concrete (this function will also need the transition and the initModel to work), but not use it by default. So now it's the user's responsibility to have transition never fail if he wants to use this utility. Thoughts on this?

From the users perspective it seems easier to me to write cleanup :: model Concrete -> IO () than cleanup :: History cmd resp -> IO () or History cmd resp -> model Concrete. Because in the model you have easy access to the references (which is what you want to clean up), while in the history you need to do more work to get access to the references (go through all responses).

Do we have the guarantee that it will never fail, even after a test failure and for a random path of the interleavings tree?

I'm not certain of this, but it seems like something we can try and monitor. It should be easy to notice if we start leaking resources. In the case of exceptions being thrown in the semantics, there's a change of resource leaking, e.g.:

resource <- acquireResource
error "This will cause a resource leak!"
return (Reference (Concrete resource))

I don't see how we can get around this problem though. I think in this case we need to rely on the user cleaning up in the semantics.

kderme commented 4 years ago

I agree that it's the users's responsibility to handle exceptions in the semantics. I have done exactly this with brackets at my rqlite tests https://github.com/advancedtelematic/quickcheck-state-machine/blob/rqlite/test/RQlite.hs#L359.

About the type of the cleanup function I don't have a strong opinion, I have tried both and they work the same https://github.com/advancedtelematic/quickcheck-state-machine/blob/rqlite/test/RQlite.hs#L443 and https://github.com/advancedtelematic/quickcheck-state-machine/blob/rqlite/test/RQlite.hs#L453 I agree that Model Concrete is probably more familiar to the user, but also my mkModel function make them equivalent in terms of difficulty, since one is derived by the other. https://github.com/advancedtelematic/quickcheck-state-machine/blob/rqlite/src/Test/StateMachine/Parallel.hs#L713 The good part about Model Concrete is that in the sequential case, we don't need to use the history, since the Model is already there.

kderme commented 4 years ago

Also, the cleanup works when I ^C the tests. Although maybe I need to always mask async exceptions when I run the cleanup..