TestCoroutineDispatcher Always Picks The Same Task Interleaving

TestCoroutineDispatcher always picks the same interleaving of concurrent tasks. This is good since it is deterministic. But it is bad because there is no way to explore other interleavings (run order) of the tasks.

If TestCoroutineDispatcher could explore different interleavings, more bugs could be found. The challenge is to add the exploration ability without losing determinism.

This draft pull request:

PR 1629

...adds a test TestConcurrentCounterTest.kt. The test sets up a coroutine that acts an integer register. Then it runs a couple "incrementer" coroutines to read-modify-write to that "register" (through channels). Of course this read-modify-write approach is classically wrong and leads to "lost updates" under certain interleavings. The purpose of the various @Test functions in that file, is to show:

that a nondeterministic test often finds the bug (see nondeterministicTestN())
a deterministic test using the built-in TestCoroutineDispatcher behavior (before the proposed changes) fails to find the bug—the test passes every time (see deterministicTestN())
a deterministic test using the simplest-possible change to TestCoroutineDispatcher that finds the bug—the test fails every time (see deterministicInaccurateDispatcherTestN())

That simplest-possible change to TestCoroutineDispatcher is to introduce random time "skew" into postDelayed() and post(). The randomness is driven by a pseudo-random number generator injected through the TestCoroutineScope constructor.

If you look at deterministicInaccurateDispatcherTestN() you'll see that this arrangement lets us explore different random seeds to give us various task interleavings. When we find an interleaving we want to replay we can "lock it down". You might imagine budgeting a certain amount of test resources for this exploration. When "troublesome" interleavings were identified we might record them as part of a regression test suite.

Related Work

This work was inspired by the work of the FoundationDB folks. See Will Wilson's 2014 Strange Loop talk Testing Distributed Systems w/ Deterministic Simulation https://www.youtube.com/watch?v=4fFDFbi3toc

See the FoundationDB page on Simulation and Testing: https://apple.github.io/foundationdb/testing.html#

See Carl Lerche's October, 2019 blog post Making the Tokio scheduler 10x faster https://tokio.rs/blog/2019-10-scheduler/ Specifically the Fearless (unsafe) concurrency with Loom section https://tokio.rs/blog/2019-10-scheduler/#fearless-unsafe-concurrency-with-loom Loom is a "a permutation testing tool for concurrent code" for Rust programs.

Opportunities for Improvement

The approach offered in this PR is intentionally minimalistic. It could be refactored so that different (interleaving) exploration mechanisms and policies could be tried.

One such policy that might be interesting to explore is dynamic partial-order reduction described here https://users.soe.ucsc.edu/~cormac/papers/popl05.pdf and used in Lerche's Loom (mentioned above).

Kotlin / kotlinx.coroutines

TestCoroutineDispatcher Always Picks The Same Task Interleaving #1630

Related Work

Opportunities for Improvement