Open ndkoval opened 1 year ago
With a time limit, changes to the system under test can result in gradually degraded test quality. What about configuring a limit on number of context switches explored, since lincheck already explores interleavings in order of number of context switches? This would be more intuitive than the current invocations per iteration limit.
I've also thought about configuring lincheck tests to run multiple times with different tradeoffs between # of iterations and # of invocations per iteration. Lincheck could handle this internal if e.g. each iteration (pseudo-)randomly chooses how many invocations/context switches to explore so that there are a few deep iterations and many shallow iterations.
We always tune the configuration parameters in a way the test takes at most the desired time.
Regarding the number of context switches: (1) it relies on the current implementation, making it impossible to develop a better model checker, and (2) it may decrease the testing time dramatically when the testing data structure becomes more complex.
I would suggest deprecating StressOptions
and ModelCheckingOptions
and adding a single LincheckOptions
instead. With LincheckOptions
, Lincheck should automatically run stress and model checking tests, spending 25% of the specified time for stress testing and 75% -- for model checking (the ratio can be different but should stay reasonable).
Let's deprecate StressOptions
and ModelCheckingOptions
with forRemoval = true
in Java and with ERROR
in Kotlin.
It seems important to me to at least keep the door open to some form of "fixed quality" test. When it comes to correctness of my system, I can't take the risk of a fixed time limit and so I would need to manually adjust the limit with every change to my system.
@btwilk How would you measure the test quality? How do you adjust the parameters now?
According to our and some users' experience, the scenarios
, invocations
, threads
, and actorsPerThread
parameters are adjusted in a way to "have at least three threads with three operations in each, reasonable (50-200) scenarios and spend the desired time in testing".
I don't have a formal measure of quality but I do know that my tests catch bugs/races/deadlocks that I introduce manually, which gives me some confidence that similar issues will be caught in the future even if my system gradually grows.
@btwilk Can you please explain how you configure Lincheck tests on some examples?
I generally follow the same rules of thumb: "have at least three threads with three operations in each, reasonable (50-200) scenarios". I don't have a good way of tuning invocations per iteration, but if I could I would want enough to cover 3 or 4 context switches. Otherwise, I just validate that the default is sufficient to discover bugs that I inject.
There is almost no intuition on how to configure Lincheck tests. How to choose the number of threads and operations in each of them? How many different scenarios should be studied? Which number of invocations is sufficient to examine each of them? Users usually adjust these parameters in a way to achieve acceptable execution time. Lincheck should provide a way to configure only the total execution time, automatically tuning all the parameters most efficiently. This feature lowers the entry threshold and improves the test quality.
While this feature is non-trivial, we can start with some straightforward implementation, improving it later. The intuition is that any straightforward implementation should not be worse than the current approach, as nobody knows how to configure Lincheck tests properly.