Open dkhalanskyjb opened 1 year ago
Leave a thumbs-up under this comment if awaitAllChildren
would replace advanceUntilIdle
for you.
Leave a thumbs-up under this comment if awaitIdle
would replace runCurrent
for you.
However, in all the cases we've seen so far where this happens, advanceUntilIdle is not needed at all and was probably put there just in case. We'll need to investigate further.
About advanceUntilIdle
, I use it to actually await for coroutines to be suspended and not be able to move forward without further external action. This wouldn't be the same as awaitAllChildren
because the coroutines are not expected to actually complete.
For example, I am writing a web socket library, and the received frames are represented as flows. Depending on implementation details, it could be that if the caller doesn't collect the flow at the right time, some frames could be dropped (it used to be the case in the past). I want to test that behaviour, and determine that internal coroutines don't drop the frame in the case where no collector is there, even if given infinite time. Here is some pseudocode:
@Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler
// some calls that lead to the generation of web socket frames somewhere in the pipe
wsSession.triggerSomeWebSocketFrame()
// no flow collection started here
advanceUntilIdle() // makes sure internal coroutines cannot drop the frame just due to races
ensureWeCanStillGetTheFrameNowByCollectingLate()
}
Here awaitAllChildren
cannot replace this, and I don't want to just use a huge delay because it doesn't express the intent properly.
What would be the suggestion in this case?
Thank you for the use case! I get that you want to wait for the moment your system reaches a quiescence period to ensure that it doesn't enter an incorrect state even when no one interacts with it for too long, and during quiescence, not all tasks are finished, as some are suspended. There are still some technical details that are unclear, though, could you elaborate on them?
Maybe the questions below are a good checklist for gathering advanceUntilIdle
's use cases in general.
delay
? If not, would runCurrent()
(or its proposed replacement, awaitIdle
) also do the trick?Dispatchers.IO
? If so, what would you want to happen to the work transferred there—should it also be awaited?Also, a question regarding your system specifically: I see a comment about races—wouldn't you be better off writing a multithreaded stress-test? Roughly, it could look like this:
@Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
repeat(100_000) {
coroutineScope {
val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler
launch(Dispatchers.Default) {
ensureWeCanStillGetTheFrameNowByCollectingLate()
}
wsSession.triggerSomeWebSocketFrame()
}
}
}
There are some synchronization tricks you could do to make sure the interesting thread interleavings are more likely to happen, but let's keep it simple for now.
This way, you would actually race some code against some other code. With deterministic single-threaded testing, you only look at a single interleaving.
Thanks a lot for your answer.
advanceUntilIdle
supports this, but ideally it's like the point above: I would love to be able to express a test that doesn't rely on such assumption, because if something changes in the implementation, I want the test to still be valid (the worst that could happen is that the test would just pass all the time and not actually test anything anymore).I see a comment about races—wouldn't you be better off writing a multithreaded stress-test? This way, you would actually race some code against some other code. With deterministic single-threaded testing, you only look at a single interleaving.
Not really. The point of this specific test is to force the system to fail in a specific case (frames are emitted while no collectors are there to get them for a long time, and then new collectors should still get them). I want the test to be guaranteed to fail if someone changes the implementation to, say, a shared flow that would just drop frames while no one listens. But I can't just check whether the frame is here immediately, because there could be suspension points in between that are such that the frame is still here for a bit and then disappears. Therefore I can't rely on a specific number of suspension points (and thus some number of yield()
in my test code), nor a specific time passed (thus some delay()
in the test code), hence why I'm relying on advanceUntilIdle()
.
I hope this clarifies my case.
I'm just wondering, what would be the main difference between runCurrent
and awaitIdle
? Would the difference be just in the naming? I thought runCurrent already "Waits until all the tasks left are the ones scheduled for a later point of virtual time.".
The main difference is that awaitIdle
is a suspend
function and doesn't block the thread. See #3917 for a recent example where this matters.
Thanks for the explanation. Then yeah, awaitIdle
would probably be the best solution for us.
Problem with other alternatives comes mostly down to time-sensitive code and nested coroutines. Let's say that we have code that sets state X immediately and then state Y after one second.
delay(20.hours)
alternative is not very helpful when testing time-sensitive code. If we called delay(20.hours)
, that call would blow straight through X, preventing us from asserting it.
Then, ideally, we should be able to write it like this:
launchCoroutineThatTriggersXAndY()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()
However, since X and Y are triggered in the background coroutine, nothing will happen. We must tell the test dispatcher to execute other coroutines, before we can assert. In some cases, simple yield()
helps:
launchCoroutineThatTriggersXAndY()
yield()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()
however, if launchCoroutineThatTriggersXAndY()
has nested coroutines, simple yield
would also not work. Most robust solution here seems to be to use runCurrent
:
launchCoroutineThatTriggersXAndY()
runCurrent()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()
yield
everywhere, even when there are no nested coroutines. This ensures that if code under test gets nested coroutines in the future, we do not need to update all tests later, changing yield
to runCurrent
, they would just work. runCurrent
over delay(20.hours)
to ensure that tests would remain working even if the code under tests becomes time-sensitive in the future.To us runBlocking
(awaitIdle
in the future) just seems like a win win over other alternatives. Are there any downsides to using runBlocking
over delay
that we are missing?
As far as I understand it's implementation, awaitAllChildren
would not work for us when testing view models. For example:
class ViewModel(someFlow: Flow<String>, coroutineContext: CoroutineContext){
private val coroutineScope:CoroutineScope= coroutineScopeWithLifecycle(coroutineContext)
val someState=someFlow.stateIn(coroutineScope, SharingStarted.WhileSubscribed(), null)
fun onButtonClick(){
coroutineScope.launch {
doStuff()
}
}
}
When testing onButtonClick
, awaitAllChildren
would never complete, because of the stateIn
-coroutine. As long as doStuff()
does not switch the dispatcher, it works with advanceUntilIdle()
. If doStuff()
switches the dispatcher none of both methods work (as described in section No interoperating with asynchronous resumptions
).
Reviewing our codebases, almost all uses of advanceTimeBy
are to set the initial virtual time at the start of a test. Since these are always the very first action, they can be replaced by delay
without change in behavior.
Thanks for this detailed write up, this was very helpful in getting my tests working. I tried advanceTimeBy
, but it doesn't work for some of my use cases because of the auto-advancing. However, using delay
does seem to be working as expected. I might recommend adding a section to the docs to explain using delay
.
My use case is I have a DataLoader
class that loads data from an API, then polls at an interval. It also has automatic retries if a call fails, using Fibonacci backoff. It internally launches retry jobs using delay
, and exposes state through a StateFlow
. So, the auto-advance behavior is a problem for the retries because it will infinitely advance through multiple loops. To test the fibonacci logic, we need to advance sequentially and assert on each retry attempt.
Personally, I think the API might be a little bit more clear if all tests could use advanceTimeBy
instead of having to understand when to use delay
. I think my preferred approach would be to add an option to turn off delay skipping, but instead of using real wall-clock time, it would suspend indefinitely until the scheduler was manually advanced past the delay. This might look like testScheduler.skipDelays = false
, or maybe testScheduler.manualTimeControl = true
.
Update: the delay
approach doesn't work for at least one use case. This is going to make some parts of my code impossible to test. I think I really do need manual time control. Here's a simplified version of my code, the behavior I'm observing is when doRefreshAfterExpires
's loop runs, it kicks off a load attempt, but my simulated delay in the load attempt gets skipped, so the state goes immediately from LOADING to SUCCESS, which makes it impossible for me to test the reloading state.
// Poll until the data is expired
suspend fun doRefreshAfterExpires(refreshAt: Instant) {
while(true) {
delay(expireCheckIntervalMs)
if (refreshAt < clock.instant()) {
mutex.withLock {
doLoadAttemptLocked()
}
break
}
}
}
// Kicks off a new load job, must be run inside a lock call.
suspend fun doLoadAttemptLocked() {
status = AsyncStatus.LOADING
loadJob = scope.launch(defaultDispatcher) {
val result = onLoad()
mutex.withLock {
if (result.isSuccess) {
status = AsyncStatus.SUCCESS
refreshAt = calculateRefreshAt(result.expiresAt)
if (refreshAt != null) {
expireCheckJob = scope.launch(defaultDispatcher) {
doRefreshAfterExpires(refreshAt = newWillRefreshAt)
}
}
} else { /* Error handling */ }
emitState()
}
}
emitState()
}
@distinctdan: it kicks off a load attempt, but my simulated delay in the load attempt gets skipped, so the state goes immediately from LOADING to SUCCESS, which makes it impossible for me to test the reloading state.
What are you trying to test exactly? If your goal is to test that a single request is started, what you would do is:
fun yourTest() = runTest {
launch { doRefreshAtExpire(…) }
// Test that no request has been made yet
delay(100)
// The specified instant hasn't been reached yet
delay(…) // let's say that after that, the instant has been reached
// Test that a first request has been started
delay(expireCheckInternaMs)
// Test that a second request has been started
// etc
}
The point is, you replicate the delays in the test function itself to measure what happens just before and after each operation. Delay-skipping keeps the order of events, so it will only skip after all events scheduled for that time are over.
So my test scenario is to make sure my DataLoader reloads the data when it expires:
delay(loadTime)
, assert state is SUCCESSdelay(expirationTime)
to simulate the data expiring, this should kick off a load attempt, assert state is LOADINGdelay(loadTime)
, assert state is SUCCESSMy onLoad
callback looks like
onLoad = {
Timber.i("before load delay")
delay(loadDelay)
Timber.i("after load delay")
DataLoaderResult.Success(data = 1, expiresAt = Instant.EPOCH.plusSeconds(5))
},
The problem is that delay
isn't acting like I expect. The first time the onLoad
function is called, it stops at the delay just like I would expect it to. However, the second time it loads, that delay is skipped, causing the loader to immediately jump to the SUCCESS state. Maybe this is expected behavior, but it doesn't seem intuitive to me, I would expect delay
to always work the same way. I've found a hacky workaround which is to use channels to force it to wait until my test can assert on the LOADING state:
val loadResultCh = Channel<DataLoaderResult<Int>>(1)
val loader = DataLoader<Int>(
onLoad = {
loadResultCh.receive()
},
}
// Load and wait for it to succeed
loader.load()
runCurrent()
Assert.assertEquals(loadState, loader.state.value)
loadResultCh.send(DataLoaderResult.Success(data = 1, expiresAt = expiresAt))
runCurrent()
Assert.assertEquals(successState, loader.state.value)
Update: I've figured this out: long story short, delay
is working correctly. The problem I was seeing was caused by 2 problems on my side:
UnconfinedTestDispatcher
solved the problems and greatly simplified the tests.delay
was smart enough to work with loops, so that if we have a polling loop, delaying for a long time will run multiple iterations of the loop. I had incorrectly assumed it would just run the final iteration. I was able to track this down by adding the testScheduler
's current time to my log statements, which revealed the problem.Thanks for the help taking a look at this, I'm going to plan on using UnconfinedTestDispatcher
with delay
for the rest of these tests.
Problem statement, the short version
We receive a lot of questions about using the test module, the answer to which is to avoid using
advanceUntilIdle
/advanceTimeBy
.advanceTimeBy
,advanceUntilIdle
, andrunCurrent
can be replaced bydelay
,yield
, or cross-coroutine communication in most tests, with better results and much clearer semantics. BecauserunTest
uses virtual time, evendelay(1.seconds)
is instantaneous. Manual time control is error-prone, discouraged, has surprising behavior, and is increasingly incompatible with the rest of the framework.We should think of ways to make these functions less prominent so that people don't use them unless they absolutely need to, providing alternatives that more directly and correctly express the intent behind their typical usage.
Request for comments: if you use
advanceTimeBy
,runCurrent
, oradvanceUntilIdle
in your code and don't think they could be reasonably replaced with the suggested alternatives when we implement them, please provide examples of your use cases.Alternatives to each manual execution control function
advanceTimeBy
This function is used when some specific event that's known to be scheduled into the future needs to happen before its results are validated.
Examples:
When you have
advanceTimeBy(n)
directly in your test (which is almost all of its usages), it can be replaced withdelay(n-1.milliseconds)
.-1
is needed becauseadvanceTimeBy
does not execute the tasks scheduled for the momentcurrentTime() + n
, whereasdelay(n)
will also execute the tasks that were already scheduled forcurrentTime() + n
beforedelay(n)
was called.So, these tests become
advanceUntilIdle
This function has the behavior of running the tasks present in the test scheduler, and only there, while there are tasks outside of
backgroundScope
. Almost always, the upper bound on the virtual time it takes to execute all code in the test dispatcher is known. Typically, it's just1.milliseconds
. So, functionally,advanceUntilIdle
can almost always be replaced bydelay(20.hours)
.However, in many cases, there are problems with this replacement:
20.hours
is a completely arbitrary constant.advanceUntilIdle
doesn't always do what it says on the box, as work on other dispatchers is not taken into account, so idleness is a fuzzy concept anyway. Though it's inarguable that whenadvanceUntilIdle
does work, it also shows the intent.backgroundScope
that happens, say, every100.milliseconds
,delay(20.hours)
will waste a lot of processing power on that work.We must first present a viable alternative to confidently claim that
advanceUntilIdle
is no longer needed.In most cases we've seen, what people intend is actually for the spawned coroutines to finish, regardless of whether these coroutines execute on the test dispatcher. So, we could introduce a function like
Replacing
advanceUntilIdle
withawaitAllChildren()
wouldn't always work. For example, here, it would lead to a deadlock:However, in all the cases we've seen so far where this happens,
advanceUntilIdle
is not needed at all and was probably put there just in case. We'll need to investigate further.runCurrent
As shown in the previous example,
delay(1.milliseconds)
usually does the trick.Still, this replacement is problematic:
runCurrent
, asrunCurrent
ensures no more tasks are scheduled for the current moment by running all of them.A proper replacement would probably be a function like
This is also needed to properly support using Espresso with the test framework https://github.com/Kotlin/kotlinx.coroutines/issues/242
Problem statement, the long version
The current understanding of how people (need to) test coroutines
Before the large test framework refactoring, there were effectively two APIs: a low-level one (
TestCoroutineScope
,runCurrent
,advanceUntilIdle
,advanceTimeBy
) and a higher-level one,runBlockingTest
, built on top of it.There were three patterns for testing coroutines.
Library vs framework
It takes an "off from the side" look at coroutines: the testing facilities were treated as a library of functions to be mixed and matched, with behavior being requested from them. One would explicitly schedule work on the test facilities, explicitly ask for it to be executed, and explicitly ensure the correct behavior.
Here, testing is conducted from inside a coroutine.
runBlockingTest
itself calledadvanceUntilIdle
andcleanupTestCoroutines
, ensuring the correct behavior in the common case, but giving up some of the flexibility.runBlockingTest
was a framework for coroutines to be tested in.runBlockingTest
was itself implemented on top of the primitives provided by the library-like interface.After the big refactoring, where every coroutine test must be wrapped in
runTest
, tests that used to be in the library form took this approach instead of translating to the pure framework-like one. What follows is an explanation of why both the pure library approach and the mixed approach are suboptimal for the common use cases and one should relegate the execution control torunTest
.The issues with manual execution control
At a glance, it may look like the library approach is clearly the better one: everything is explicit, with minimal magical behavior that needs to be understood, and one can engage with the scheduled coroutines as closely as needed, whereas
runBlockingTest
just does its thing somehow on its own in mysterious ways.Unfortunately, careful study of hundreds of tests has shown that what people need is a framework for testing, not a library.
Tests not testing anything
People were misusing the testing library all the time. For example, consider this "test":
This test runs
foo
until the first suspension and then stops doing anything at all.assertEquals
will never get called.The root of the issue is that it's just incorrect to create a
TestCoroutineScope
and not callcleanupTestCoroutines
at some point. This test is more than useless: it gives a false sense of security, leaving the impression thatfoo
is properly tested.No interoperating with asynchronous resumptions
Problem statement
Manual time control only knows about tasks that the test dispatcher needs to execute. It doesn't know anything about tasks that happen on other dispatchers.
advanceUntilIdle
here will progress the first launched coroutine until the point whenfoo
is entered. After that, the test dispatcher has no control over what happens,Dispatchers.IO
has to take it from there.So, by the time the second
launch
happens, a network call may or may have not been executed. Most likely, it wasn't. There are race conditions in this test, making it flaky and unpredictable.Luckily, most tests were not written this way: most likely, the network call wouldn't have enough time to finish by the end of the test, and
runBlockingTest
would crash with an exception notifying that some coroutines spawned inside it didn't complete. This behavior was mysterious to many, as understanding it requires a solid model of the internals of coroutines, and so people were just generally discouraged from using non-test dispatchers inside tests.Add to it the previous problem of people not even using
runBlockingTest
and forgetting to clean up coroutines, and you get a recipe for disaster: even if you didn't forgetadvanceUntilIdle
at the end of the test, it still meant little. Some adventurous souls went out of their way to make tests pass, rewriting tests not to userunBlockingTest
orcleanupTestCoroutines
as they "caused the test to crash."A partial fix
The library approach can be made to properly work with external dispatches like this:
This way, until the last line of the test body has finished executing (after which
job.isActive
becomesfalse
), we run all the tasks scheduled in the test dispatcher, and if we're not done after that, also wait for100.milliseconds
for some other dispatchers to asynchronously pass more tasks to the test dispatcher.However, note that the only part of the test relevant to us is between lines
A
andB
. All the other things are simply boilerplate that virtually every test needs, but is very difficult to come up with.Manual execution control inherently can't support asynchronous work in general
All of the above is a good indication that manual execution control is error-prone in general in the case of asynchronous resumptions: every
advanceUntilIdle
must be replaced with code between linesC
andD
in order to properly support them.The problem is, the role of
job
is not always obvious. For example, consider this test:If one attempted to replace
advanceUntilIdle
here with theC
-D
construct, there would be no one to fulfill the role ofjob
.bar
does calllaunch
, but does it as an implementation detail, without exposing the resultingJob
. CallingadvanceUntilIdle
here is simply useless, even though it doesn't seem this way at a glance: from the point of view of the test framework, the system is idle while non-test dispatchers are busy.Replacing
advanceUntilIdle
withdelay(2.seconds)
would not fix the behavior in this case, but it would still clear up the conceptual model. Since, inside the test dispatcher, delays are instantaneous, from its point of view, waiting forDispatchers.IO
to finish doing its thing requires infinite time. The problem of "I don't know when asynchronous work will be completed" needs to be solved programmatically somehow: you can't test asynchronous work if you don't know when it finishes, but the false concept of system idleness significantly muddies the mental model behind the testing framework.Interactions between manual execution control and real-time require thread blocking
Another issue stems from the fact that manual execution control functions are blocking and can't suspend.
For an example of where this causes problems, see https://github.com/Kotlin/kotlinx.coroutines/issues/3179. We have a reasonable request to disable virtual time in parts of tests. However, the current design of manual execution control prevents us from reaching a clear solution.
What virtual time should be printed? In what order should the numbers be printed?
There are seemingly two viable answers:
1.seconds
, the order should be0
(in half a second),1
(in a second since the start of the test), and then2
(also in a second since the start of the test).It's unlikely that someone would expect
job1
to complete afterjob2
just because it's required to use the real-world time forjob1
, as the abstraction behind virtual time is that it is just like the actual time, but for tests, it's passing infinitely faster.The next question is, what should happen if line
A
is replaced withadvanceUntilIdle
oradvanceTimeBy(2.seconds)
. Neither of them is allowed to wait for one real-life second, as they are blocking functions. They would have to block the thread during waiting, which is not even expressible in the JS implementation of the test framework, but would be a strange behavior in any case. In any case, after half a second of waiting, their blocking would also need to be interrupted, as a new task arrived to the test scheduler, ready for execution.All of this could be easily mitigated if
advanceUntilIdle
andadvanceTimeBy
weresuspend
functions: they would just suspend, freeing the thread to do other things until the real-time wait is over or a new task arrives. This is exactly the behavior that writingdelay(2.seconds)
instead ofadvanceTimeBy
has out of the box!There is no way to make time control functions play nicely in these scenarios. It seems like the only sensible way to implement them would be to throw
UnsupporteOperationException
if they encounter a task that requires waiting for a given amount of real time.Most tests don't need this kind of complex API
Out of the hundreds of tests we've seen during the research, there was a single-digit number of them that did use manual execution control in a way that the framework style wasn't better suited for.
When implementing low-level libraries over the coroutines (for example, when writing your own schedulers), one may need to validate the behavior in cases of interleavings that are difficult to emulate with just
runBlockingTest
, mocking calls, and callingdelay
. The majority of tests were worse off not usingrunBlockingTest
than otherwise, being more brittle.Why did we keep manual execution control after the refactoring if it's so problematic?
There are two reasons:
runCurrent
oradvanceTimeBy
from outside the test body, where they can't be replaced with adelay
.