Add `testingTime(..)` option to configure Lincheck tests

JetBrains / lincheck

Framework for testing concurrent data structures

Mozilla Public License 2.0

572 stars 33 forks source link

Add `testingTime(..)` option to configure Lincheck tests #137

Open ndkoval opened 1 year ago

ndkoval commented 1 year ago

There is almost no intuition on how to configure Lincheck tests. How to choose the number of threads and operations in each of them? How many different scenarios should be studied? Which number of invocations is sufficient to examine each of them? Users usually adjust these parameters in a way to achieve acceptable execution time. Lincheck should provide a way to configure only the total execution time, automatically tuning all the parameters most efficiently. This feature lowers the entry threshold and improves the test quality.

While this feature is non-trivial, we can start with some straightforward implementation, improving it later. The intuition is that any straightforward implementation should not be worse than the current approach, as nobody knows how to configure Lincheck tests properly.

btwilk commented 1 year ago

With a time limit, changes to the system under test can result in gradually degraded test quality. What about configuring a limit on number of context switches explored, since lincheck already explores interleavings in order of number of context switches? This would be more intuitive than the current invocations per iteration limit.

I've also thought about configuring lincheck tests to run multiple times with different tradeoffs between # of iterations and # of invocations per iteration. Lincheck could handle this internal if e.g. each iteration (pseudo-)randomly chooses how many invocations/context switches to explore so that there are a few deep iterations and many shallow iterations.

ndkoval commented 1 year ago

We always tune the configuration parameters in a way the test takes at most the desired time.

Regarding the number of context switches: (1) it relies on the current implementation, making it impossible to develop a better model checker, and (2) it may decrease the testing time dramatically when the testing data structure becomes more complex.

ndkoval commented 1 year ago

I would suggest deprecating StressOptions and ModelCheckingOptions and adding a single LincheckOptions instead. With LincheckOptions, Lincheck should automatically run stress and model checking tests, spending 25% of the specified time for stress testing and 75% -- for model checking (the ratio can be different but should stay reasonable).

Let's deprecate StressOptions and ModelCheckingOptions with forRemoval = true in Java and with ERROR in Kotlin.

btwilk commented 1 year ago

It seems important to me to at least keep the door open to some form of "fixed quality" test. When it comes to correctness of my system, I can't take the risk of a fixed time limit and so I would need to manually adjust the limit with every change to my system.

ndkoval commented 1 year ago

@btwilk How would you measure the test quality? How do you adjust the parameters now?

According to our and some users' experience, the scenarios, invocations, threads, and actorsPerThread parameters are adjusted in a way to "have at least three threads with three operations in each, reasonable (50-200) scenarios and spend the desired time in testing".

btwilk commented 1 year ago

I don't have a formal measure of quality but I do know that my tests catch bugs/races/deadlocks that I introduce manually, which gives me some confidence that similar issues will be caught in the future even if my system gradually grows.

ndkoval commented 1 year ago

@btwilk Can you please explain how you configure Lincheck tests on some examples?

btwilk commented 1 year ago

I generally follow the same rules of thumb: "have at least three threads with three operations in each, reasonable (50-200) scenarios". I don't have a good way of tuning invocations per iteration, but if I could I would want enough to cover 3 or 4 context switches. Otherwise, I just validate that the default is sufficient to discover bugs that I inject.