Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
13.07k stars 1.85k forks source link

Discourage manual execution control in the test framework #3919

Open dkhalanskyjb opened 1 year ago

dkhalanskyjb commented 1 year ago

Problem statement, the short version

We receive a lot of questions about using the test module, the answer to which is to avoid using advanceUntilIdle/advanceTimeBy.

advanceTimeBy, advanceUntilIdle, and runCurrent can be replaced by delay, yield, or cross-coroutine communication in most tests, with better results and much clearer semantics. Because runTest uses virtual time, even delay(1.seconds) is instantaneous. Manual time control is error-prone, discouraged, has surprising behavior, and is increasingly incompatible with the rest of the framework.

We should think of ways to make these functions less prominent so that people don't use them unless they absolutely need to, providing alternatives that more directly and correctly express the intent behind their typical usage.

Request for comments: if you use advanceTimeBy, runCurrent, or advanceUntilIdle in your code and don't think they could be reasonably replaced with the suggested alternatives when we implement them, please provide examples of your use cases.

Alternatives to each manual execution control function

advanceTimeBy

This function is used when some specific event that's known to be scheduled into the future needs to happen before its results are validated.

Examples:

fun CoroutineScope.runAfter(n: Duration, blockAction: () -> Unit) {
  launch {
    delay(n)
    blockAction()
  }
}

// The code before: the simple scenario
@Test
fun testRunAfter1() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  advanceTimeBy(11.seconds)
  assertTrue(entered)
}

// The code before: the complex scenario
@Test
fun testRunAfter2() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  advanceTimeBy(10.seconds)
  assertFalse(entered)
  runCurrent()
  assertTrue(entered)
}

When you have advanceTimeBy(n) directly in your test (which is almost all of its usages), it can be replaced with delay(n-1.milliseconds).

So, these tests become

// The code after: the simple scenario
@Test
fun testRunAfter1() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  delay(11.seconds) // here, there's no difference whether to subtract one millisecond
  assertTrue(entered)
}

// The code after: the complex scenario
@Test
fun testRunAfter2() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  delay(10.seconds - 1.milliseconds)
  assertFalse(entered)
  delay(1.milliseconds) // can't run `runCurrent`, as it was not yet time for the block to execute
  assertTrue(entered)
}

advanceUntilIdle

This function has the behavior of running the tasks present in the test scheduler, and only there, while there are tasks outside of backgroundScope. Almost always, the upper bound on the virtual time it takes to execute all code in the test dispatcher is known. Typically, it's just 1.milliseconds. So, functionally, advanceUntilIdle can almost always be replaced by delay(20.hours).

However, in many cases, there are problems with this replacement:

We must first present a viable alternative to confidently claim that advanceUntilIdle is no longer needed.

In most cases we've seen, what people intend is actually for the spawned coroutines to finish, regardless of whether these coroutines execute on the test dispatcher. So, we could introduce a function like

/**
 * Suspends until all the coroutines spawned in this [CoroutineScope] are completed.
 */
suspend fun CoroutineScope.awaitAllChildren(): Unit {
    while (true) {
        val children = this.coroutineContext[Job]!!.children.toList()
        if (children.all { !isActive }) break
        children.forEach { it.join() }
    }
}

Replacing advanceUntilIdle with awaitAllChildren() wouldn't always work. For example, here, it would lead to a deadlock:

fun testFoo() = runTest {
  val channel = Channel<Unit>()
  launch { channel.receive() }
  advanceUntilIdle()
  channel.send(Unit)
}

However, in all the cases we've seen so far where this happens, advanceUntilIdle is not needed at all and was probably put there just in case. We'll need to investigate further.

runCurrent

As shown in the previous example, delay(1.milliseconds) usually does the trick.

Still, this replacement is problematic:

A proper replacement would probably be a function like

/**
 * Waits until all the tasks left are the ones scheduled for a later point of virtual time.
 */
suspend fun TestCoroutineScheduler.awaitIdle(): Unit

This is also needed to properly support using Espresso with the test framework https://github.com/Kotlin/kotlinx.coroutines/issues/242

Problem statement, the long version

The current understanding of how people (need to) test coroutines

Before the large test framework refactoring, there were effectively two APIs: a low-level one (TestCoroutineScope, runCurrent, advanceUntilIdle, advanceTimeBy) and a higher-level one, runBlockingTest, built on top of it.

There were three patterns for testing coroutines.

Library vs framework

  1. The first pattern looked like this:
// OLD API, no longer available
fun testFooBar() {
  val scope = TestCoroutineScope()
  try {
    scope.launch { foo() }
    advanceUntilIdle()
    scope.launch { bar() }
    advanceUntilIdle()
  } finally {
    cleanupTestCoroutines()
  }
}

It takes an "off from the side" look at coroutines: the testing facilities were treated as a library of functions to be mixed and matched, with behavior being requested from them. One would explicitly schedule work on the test facilities, explicitly ask for it to be executed, and explicitly ensure the correct behavior.

  1. Another approach was like this:
// OLD API, no longer available
fun testFooBar() {
  runBlockingTest {
    foo()
    bar()
  }
}

Here, testing is conducted from inside a coroutine. runBlockingTest itself called advanceUntilIdle and cleanupTestCoroutines, ensuring the correct behavior in the common case, but giving up some of the flexibility. runBlockingTest was a framework for coroutines to be tested in. runBlockingTest was itself implemented on top of the primitives provided by the library-like interface.

  1. There is also a mixed approach:
// OLD API, no longer available
fun testFooBar() {
  runBlockingTest {
    launch { foo() }
    advanceUntilIdle()
    launch { bar() }
    advanceUntilIdle()
  }
}

After the big refactoring, where every coroutine test must be wrapped in runTest, tests that used to be in the library form took this approach instead of translating to the pure framework-like one. What follows is an explanation of why both the pure library approach and the mixed approach are suboptimal for the common use cases and one should relegate the execution control to runTest.

The issues with manual execution control

At a glance, it may look like the library approach is clearly the better one: everything is explicit, with minimal magical behavior that needs to be understood, and one can engage with the scheduled coroutines as closely as needed, whereas runBlockingTest just does its thing somehow on its own in mysterious ways.

Unfortunately, careful study of hundreds of tests has shown that what people need is a framework for testing, not a library.

Tests not testing anything

People were misusing the testing library all the time. For example, consider this "test":

// OLD API, no longer available
@Test
fun testFoo() {
  val testScope = TestCoroutineScope()
  testScope.launch {
    val a = foo()
    assertEquals(x, a)
  }
}

This test runs foo until the first suspension and then stops doing anything at all. assertEquals will never get called.

The root of the issue is that it's just incorrect to create a TestCoroutineScope and not call cleanupTestCoroutines at some point. This test is more than useless: it gives a false sense of security, leaving the impression that foo is properly tested.

No interoperating with asynchronous resumptions
Problem statement

Manual time control only knows about tasks that the test dispatcher needs to execute. It doesn't know anything about tasks that happen on other dispatchers.

fun foo() = withContext(Dispatchers.IO) {
  // a network call
}

// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
  runBlockingTest {
    launch {
      val a = foo()
      assertEquals(x, a)
    }
    advanceUntilIdle()
    launch {
      val b = foo()
      assertEquals(x, b)
    }
  }
}

advanceUntilIdle here will progress the first launched coroutine until the point when foo is entered. After that, the test dispatcher has no control over what happens, Dispatchers.IO has to take it from there.

So, by the time the second launch happens, a network call may or may have not been executed. Most likely, it wasn't. There are race conditions in this test, making it flaky and unpredictable.

Luckily, most tests were not written this way: most likely, the network call wouldn't have enough time to finish by the end of the test, and runBlockingTest would crash with an exception notifying that some coroutines spawned inside it didn't complete. This behavior was mysterious to many, as understanding it requires a solid model of the internals of coroutines, and so people were just generally discouraged from using non-test dispatchers inside tests.

Add to it the previous problem of people not even using runBlockingTest and forgetting to clean up coroutines, and you get a recipe for disaster: even if you didn't forget advanceUntilIdle at the end of the test, it still meant little. Some adventurous souls went out of their way to make tests pass, rewriting tests not to use runBlockingTest or cleanupTestCoroutines as they "caused the test to crash."

A partial fix

The library approach can be made to properly work with external dispatches like this:

// OLD API, no longer available
@Test
fun testFoo() {
  val scope = TestCoroutineScope()
  val job = scope.launch {
    // ____________ A ___________
    // this is the main test body
    val a = foo()
    assertEquals(x, a)
    val b = foo()
    assertEquals(x, b)
    // ____________ B ____________
  }
  try {
    // ____________ C ____________
    scope.advanceUntilIdle()
    while (job.isActive) {
      delay(100.milliseconds)
      scope.advanceUntilIdle()
    }
    // _____________ D ____________
  } finally {
    scope.cleanupTestCoroutines()
  }
}

This way, until the last line of the test body has finished executing (after which job.isActive becomes false), we run all the tasks scheduled in the test dispatcher, and if we're not done after that, also wait for 100.milliseconds for some other dispatchers to asynchronously pass more tasks to the test dispatcher.

However, note that the only part of the test relevant to us is between lines A and B. All the other things are simply boilerplate that virtually every test needs, but is very difficult to come up with.

Manual execution control inherently can't support asynchronous work in general

All of the above is a good indication that manual execution control is error-prone in general in the case of asynchronous resumptions: every advanceUntilIdle must be replaced with code between lines C and D in order to properly support them.

The problem is, the role of job is not always obvious. For example, consider this test:

fun CoroutineScope.bar() {
  launch(Dispatchers.IO) {
    // something
  }
}

// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
  runBlockingTest {
    bar()
    advanceUntilIdle()
  }
}

If one attempted to replace advanceUntilIdle here with the C-D construct, there would be no one to fulfill the role of job. bar does call launch, but does it as an implementation detail, without exposing the resulting Job. Calling advanceUntilIdle here is simply useless, even though it doesn't seem this way at a glance: from the point of view of the test framework, the system is idle while non-test dispatchers are busy.

Replacing advanceUntilIdle with delay(2.seconds) would not fix the behavior in this case, but it would still clear up the conceptual model. Since, inside the test dispatcher, delays are instantaneous, from its point of view, waiting for Dispatchers.IO to finish doing its thing requires infinite time. The problem of "I don't know when asynchronous work will be completed" needs to be solved programmatically somehow: you can't test asynchronous work if you don't know when it finishes, but the false concept of system idleness significantly muddies the mental model behind the testing framework.

Interactions between manual execution control and real-time require thread blocking

Another issue stems from the fact that manual execution control functions are blocking and can't suspend.

For an example of where this causes problems, see https://github.com/Kotlin/kotlinx.coroutines/issues/3179. We have a reasonable request to disable virtual time in parts of tests. However, the current design of manual execution control prevents us from reaching a clear solution.

fun testFoo() = runTest {
  val job1 = launch(NoDelaySkipping) {
    delay(1.seconds)
    println(1)
  }
  launch {
    withContext(Dispatchers.IO) {
      Thread.sleep(500) // 0.5 seconds
    }
    println(0)
  }
  val job2 = launch {
    delay(2.seconds)
    println(2)
  }
  // _______ A _________
  job1.join()
  println(currentTime())
}

What virtual time should be printed? In what order should the numbers be printed?

There are seemingly two viable answers:

  1. It doesn't matter, as disabling virtual time is only needed in fairly specific circumstances, or
  2. The current time should be 1.seconds, the order should be 0 (in half a second), 1 (in a second since the start of the test), and then 2 (also in a second since the start of the test).

It's unlikely that someone would expect job1 to complete after job2 just because it's required to use the real-world time for job1, as the abstraction behind virtual time is that it is just like the actual time, but for tests, it's passing infinitely faster.

The next question is, what should happen if line A is replaced with advanceUntilIdle or advanceTimeBy(2.seconds). Neither of them is allowed to wait for one real-life second, as they are blocking functions. They would have to block the thread during waiting, which is not even expressible in the JS implementation of the test framework, but would be a strange behavior in any case. In any case, after half a second of waiting, their blocking would also need to be interrupted, as a new task arrived to the test scheduler, ready for execution.

All of this could be easily mitigated if advanceUntilIdle and advanceTimeBy were suspend functions: they would just suspend, freeing the thread to do other things until the real-time wait is over or a new task arrives. This is exactly the behavior that writing delay(2.seconds) instead of advanceTimeBy has out of the box!

There is no way to make time control functions play nicely in these scenarios. It seems like the only sensible way to implement them would be to throw UnsupporteOperationException if they encounter a task that requires waiting for a given amount of real time.

Most tests don't need this kind of complex API

Out of the hundreds of tests we've seen during the research, there was a single-digit number of them that did use manual execution control in a way that the framework style wasn't better suited for.

When implementing low-level libraries over the coroutines (for example, when writing your own schedulers), one may need to validate the behavior in cases of interleavings that are difficult to emulate with just runBlockingTest, mocking calls, and calling delay. The majority of tests were worse off not using runBlockingTest than otherwise, being more brittle.

Why did we keep manual execution control after the refactoring if it's so problematic?

There are two reasons:

dkhalanskyjb commented 1 year ago

Leave a thumbs-up under this comment if awaitAllChildren would replace advanceUntilIdle for you.

dkhalanskyjb commented 1 year ago

Leave a thumbs-up under this comment if awaitIdle would replace runCurrent for you.

joffrey-bion commented 1 year ago

However, in all the cases we've seen so far where this happens, advanceUntilIdle is not needed at all and was probably put there just in case. We'll need to investigate further.

About advanceUntilIdle, I use it to actually await for coroutines to be suspended and not be able to move forward without further external action. This wouldn't be the same as awaitAllChildren because the coroutines are not expected to actually complete.

For example, I am writing a web socket library, and the received frames are represented as flows. Depending on implementation details, it could be that if the caller doesn't collect the flow at the right time, some frames could be dropped (it used to be the case in the past). I want to test that behaviour, and determine that internal coroutines don't drop the frame in the case where no collector is there, even if given infinite time. Here is some pseudocode:

@Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
    val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler

    // some calls that lead to the generation of web socket frames somewhere in the pipe
    wsSession.triggerSomeWebSocketFrame()

    // no flow collection started here

    advanceUntilIdle() // makes sure internal coroutines cannot drop the frame just due to races

    ensureWeCanStillGetTheFrameNowByCollectingLate()
}

Here awaitAllChildren cannot replace this, and I don't want to just use a huge delay because it doesn't express the intent properly.

What would be the suggestion in this case?

dkhalanskyjb commented 1 year ago

Thank you for the use case! I get that you want to wait for the moment your system reaches a quiescence period to ensure that it doesn't enter an incorrect state even when no one interacts with it for too long, and during quiescence, not all tasks are finished, as some are suspended. There are still some technical details that are unclear, though, could you elaborate on them?

Maybe the questions below are a good checklist for gathering advanceUntilIdle's use cases in general.

Also, a question regarding your system specifically: I see a comment about races—wouldn't you be better off writing a multithreaded stress-test? Roughly, it could look like this:

@Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
  repeat(100_000) {
    coroutineScope {
      val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler
      launch(Dispatchers.Default) {
          ensureWeCanStillGetTheFrameNowByCollectingLate()
      }
      wsSession.triggerSomeWebSocketFrame()
    }
  }
}

There are some synchronization tricks you could do to make sure the interesting thread interleavings are more likely to happen, but let's keep it simple for now.

This way, you would actually race some code against some other code. With deterministic single-threaded testing, you only look at a single interleaving.

joffrey-bion commented 1 year ago

Thanks a lot for your answer.

I see a comment about races—wouldn't you be better off writing a multithreaded stress-test? This way, you would actually race some code against some other code. With deterministic single-threaded testing, you only look at a single interleaving.

Not really. The point of this specific test is to force the system to fail in a specific case (frames are emitted while no collectors are there to get them for a long time, and then new collectors should still get them). I want the test to be guaranteed to fail if someone changes the implementation to, say, a shared flow that would just drop frames while no one listens. But I can't just check whether the frame is here immediately, because there could be suspension points in between that are such that the frame is still here for a bit and then disappears. Therefore I can't rely on a specific number of suspension points (and thus some number of yield() in my test code), nor a specific time passed (thus some delay() in the test code), hence why I'm relying on advanceUntilIdle().

I hope this clarifies my case.

matejdro commented 1 year ago

I'm just wondering, what would be the main difference between runCurrent and awaitIdle? Would the difference be just in the naming? I thought runCurrent already "Waits until all the tasks left are the ones scheduled for a later point of virtual time.".

dkhalanskyjb commented 1 year ago

The main difference is that awaitIdle is a suspend function and doesn't block the thread. See #3917 for a recent example where this matters.

matejdro commented 1 year ago

Thanks for the explanation. Then yeah, awaitIdle would probably be the best solution for us.

Problem with other alternatives comes mostly down to time-sensitive code and nested coroutines. Let's say that we have code that sets state X immediately and then state Y after one second.

delay(20.hours) alternative is not very helpful when testing time-sensitive code. If we called delay(20.hours), that call would blow straight through X, preventing us from asserting it.

Then, ideally, we should be able to write it like this:

launchCoroutineThatTriggersXAndY()

assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()

However, since X and Y are triggered in the background coroutine, nothing will happen. We must tell the test dispatcher to execute other coroutines, before we can assert. In some cases, simple yield() helps:

launchCoroutineThatTriggersXAndY()

yield()

assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()

however, if launchCoroutineThatTriggersXAndY() has nested coroutines, simple yield would also not work. Most robust solution here seems to be to use runCurrent:

launchCoroutineThatTriggersXAndY()

runCurrent()

assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()

To us runBlocking (awaitIdle in the future) just seems like a win win over other alternatives. Are there any downsides to using runBlocking over delay that we are missing?

benkuly commented 1 year ago

As far as I understand it's implementation, awaitAllChildren would not work for us when testing view models. For example:

class ViewModel(someFlow: Flow<String>, coroutineContext: CoroutineContext){
    private val coroutineScope:CoroutineScope= coroutineScopeWithLifecycle(coroutineContext)
    val someState=someFlow.stateIn(coroutineScope, SharingStarted.WhileSubscribed(), null)

    fun onButtonClick(){
        coroutineScope.launch { 
            doStuff() 
        }
    }
}

When testing onButtonClick, awaitAllChildren would never complete, because of the stateIn-coroutine. As long as doStuff() does not switch the dispatcher, it works with advanceUntilIdle(). If doStuff() switches the dispatcher none of both methods work (as described in section No interoperating with asynchronous resumptions).

CLOVIS-AI commented 3 months ago

Reviewing our codebases, almost all uses of advanceTimeBy are to set the initial virtual time at the start of a test. Since these are always the very first action, they can be replaced by delay without change in behavior.

distinctdan commented 3 months ago

Thanks for this detailed write up, this was very helpful in getting my tests working. I tried advanceTimeBy, but it doesn't work for some of my use cases because of the auto-advancing. However, using delay does seem to be working as expected. I might recommend adding a section to the docs to explain using delay.

My use case is I have a DataLoader class that loads data from an API, then polls at an interval. It also has automatic retries if a call fails, using Fibonacci backoff. It internally launches retry jobs using delay, and exposes state through a StateFlow. So, the auto-advance behavior is a problem for the retries because it will infinitely advance through multiple loops. To test the fibonacci logic, we need to advance sequentially and assert on each retry attempt.

Personally, I think the API might be a little bit more clear if all tests could use advanceTimeBy instead of having to understand when to use delay. I think my preferred approach would be to add an option to turn off delay skipping, but instead of using real wall-clock time, it would suspend indefinitely until the scheduler was manually advanced past the delay. This might look like testScheduler.skipDelays = false, or maybe testScheduler.manualTimeControl = true.

distinctdan commented 3 months ago

Update: the delay approach doesn't work for at least one use case. This is going to make some parts of my code impossible to test. I think I really do need manual time control. Here's a simplified version of my code, the behavior I'm observing is when doRefreshAfterExpires's loop runs, it kicks off a load attempt, but my simulated delay in the load attempt gets skipped, so the state goes immediately from LOADING to SUCCESS, which makes it impossible for me to test the reloading state.

// Poll until the data is expired
suspend fun doRefreshAfterExpires(refreshAt: Instant) {
    while(true) {
        delay(expireCheckIntervalMs)

        if (refreshAt < clock.instant()) {
            mutex.withLock {
                doLoadAttemptLocked()
            }
            break
        }
    }
}

// Kicks off a new load job, must be run inside a lock call.
suspend fun doLoadAttemptLocked() {
    status = AsyncStatus.LOADING
    loadJob = scope.launch(defaultDispatcher) {
        val result = onLoad()

        mutex.withLock {
            if (result.isSuccess) {
                status = AsyncStatus.SUCCESS
                refreshAt = calculateRefreshAt(result.expiresAt)

                if (refreshAt != null) {
                    expireCheckJob = scope.launch(defaultDispatcher) {
                        doRefreshAfterExpires(refreshAt = newWillRefreshAt)
                    }
                }
            } else { /* Error handling */ }
            emitState()
        }
    }
    emitState()
}
CLOVIS-AI commented 3 months ago

@distinctdan: it kicks off a load attempt, but my simulated delay in the load attempt gets skipped, so the state goes immediately from LOADING to SUCCESS, which makes it impossible for me to test the reloading state.

What are you trying to test exactly? If your goal is to test that a single request is started, what you would do is:

fun yourTest() = runTest {
    launch { doRefreshAtExpire(…) }

    // Test that no request has been made yet

    delay(100)
    // The specified instant hasn't been reached yet

    delay(…) // let's say that after that, the instant has been reached
    // Test that a first request has been started

    delay(expireCheckInternaMs)
    // Test that a second request has been started

    // etc
}

The point is, you replicate the delays in the test function itself to measure what happens just before and after each operation. Delay-skipping keeps the order of events, so it will only skip after all events scheduled for that time are over.

distinctdan commented 3 months ago

So my test scenario is to make sure my DataLoader reloads the data when it expires:

My onLoad callback looks like

onLoad = {
    Timber.i("before load delay")
    delay(loadDelay)
    Timber.i("after load delay")
    DataLoaderResult.Success(data = 1, expiresAt = Instant.EPOCH.plusSeconds(5))
},

The problem is that delay isn't acting like I expect. The first time the onLoad function is called, it stops at the delay just like I would expect it to. However, the second time it loads, that delay is skipped, causing the loader to immediately jump to the SUCCESS state. Maybe this is expected behavior, but it doesn't seem intuitive to me, I would expect delay to always work the same way. I've found a hacky workaround which is to use channels to force it to wait until my test can assert on the LOADING state:

val loadResultCh = Channel<DataLoaderResult<Int>>(1)
val loader = DataLoader<Int>(
    onLoad = {
        loadResultCh.receive()
    },
}

// Load and wait for it to succeed
loader.load()
runCurrent()
Assert.assertEquals(loadState, loader.state.value)

loadResultCh.send(DataLoaderResult.Success(data = 1, expiresAt = expiresAt))
runCurrent()
Assert.assertEquals(successState, loader.state.value)
distinctdan commented 3 months ago

Update: I've figured this out: long story short, delay is working correctly. The problem I was seeing was caused by 2 problems on my side:

Thanks for the help taking a look at this, I'm going to plan on using UnconfinedTestDispatcher with delay for the rest of these tests.