TestCoroutineDispatcher always picks the same interleaving of concurrent tasks. This is good since it is deterministic. But it is bad because there is no way to explore other interleavings (run order) of the tasks.
If TestCoroutineDispatcher could explore different interleavings, more bugs could be found. The challenge is to add the exploration ability without losing determinism.
...adds a test TestConcurrentCounterTest.kt. The test sets up a coroutine that acts an integer register. Then it runs a couple "incrementer" coroutines to read-modify-write to that "register" (through channels). Of course this read-modify-write approach is classically wrong and leads to "lost updates" under certain interleavings. The purpose of the various @Test functions in that file, is to show:
that a nondeterministic test often finds the bug (see nondeterministicTestN())
a deterministic test using the built-in TestCoroutineDispatcher behavior (before the proposed changes) fails to find the bug—the test passes every time (see deterministicTestN())
a deterministic test using the simplest-possible change to TestCoroutineDispatcher that finds the bug—the test fails every time (see deterministicInaccurateDispatcherTestN())
That simplest-possible change to TestCoroutineDispatcher is to introduce random time "skew" into postDelayed() and post(). The randomness is driven by a pseudo-random number generator injected through the TestCoroutineScope constructor.
If you look at deterministicInaccurateDispatcherTestN() you'll see that this arrangement lets us explore different random seeds to give us various task interleavings. When we find an interleaving we want to replay we can "lock it down". You might imagine budgeting a certain amount of test resources for this exploration. When "troublesome" interleavings were identified we might record them as part of a regression test suite.
Related Work
This work was inspired by the work of the FoundationDB folks. See Will Wilson's 2014 Strange Loop talk Testing Distributed Systems w/ Deterministic Simulationhttps://www.youtube.com/watch?v=4fFDFbi3toc
The approach offered in this PR is intentionally minimalistic. It could be refactored so that different (interleaving) exploration mechanisms and policies could be tried.
TestCoroutineDispatcher
always picks the same interleaving of concurrent tasks. This is good since it is deterministic. But it is bad because there is no way to explore other interleavings (run order) of the tasks.If
TestCoroutineDispatcher
could explore different interleavings, more bugs could be found. The challenge is to add the exploration ability without losing determinism.This draft pull request:
PR 1629
...adds a test
TestConcurrentCounterTest.kt
. The test sets up a coroutine that acts an integer register. Then it runs a couple "incrementer" coroutines to read-modify-write to that "register" (through channels). Of course this read-modify-write approach is classically wrong and leads to "lost updates" under certain interleavings. The purpose of the various@Test
functions in that file, is to show:nondeterministicTestN()
)TestCoroutineDispatcher
behavior (before the proposed changes) fails to find the bug—the test passes every time (seedeterministicTestN()
)TestCoroutineDispatcher
that finds the bug—the test fails every time (seedeterministicInaccurateDispatcherTestN()
)That simplest-possible change to
TestCoroutineDispatcher
is to introduce random time "skew" intopostDelayed()
andpost()
. The randomness is driven by a pseudo-random number generator injected through theTestCoroutineScope
constructor.If you look at
deterministicInaccurateDispatcherTestN()
you'll see that this arrangement lets us explore different random seeds to give us various task interleavings. When we find an interleaving we want to replay we can "lock it down". You might imagine budgeting a certain amount of test resources for this exploration. When "troublesome" interleavings were identified we might record them as part of a regression test suite.Related Work
This work was inspired by the work of the FoundationDB folks. See Will Wilson's 2014 Strange Loop talk Testing Distributed Systems w/ Deterministic Simulation https://www.youtube.com/watch?v=4fFDFbi3toc
See the FoundationDB page on Simulation and Testing: https://apple.github.io/foundationdb/testing.html#
See Carl Lerche's October, 2019 blog post Making the Tokio scheduler 10x faster https://tokio.rs/blog/2019-10-scheduler/ Specifically the Fearless (unsafe) concurrency with Loom section https://tokio.rs/blog/2019-10-scheduler/#fearless-unsafe-concurrency-with-loom Loom is a "a permutation testing tool for concurrent code" for Rust programs.
Opportunities for Improvement
The approach offered in this PR is intentionally minimalistic. It could be refactored so that different (interleaving) exploration mechanisms and policies could be tried.
One such policy that might be interesting to explore is dynamic partial-order reduction described here https://users.soe.ucsc.edu/~cormac/papers/popl05.pdf and used in Lerche's Loom (mentioned above).