Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
12.83k stars 1.83k forks source link

kotlinx.coroutines.test: Option to disable virtual time and delay skipping #3179

Open sergeych opened 2 years ago

sergeych commented 2 years ago

It is often a need to keep regular delay() behavior with actual delay, for example:

It could be something like runTest(noVirtualTime=true) { /}

joffrey-bion commented 2 years ago

Yep, I also think it's really important. Another use case is integration tests that need to interact with real-time systems that are unaware of virtual time and cannot be made aware in any way.

The purpose of kotlinx-coroutines-test changed when it became multiplatform. It is no longer only meant for delay skipping, it is meant to help run multiplatform tests using coroutines (especially regarding the JS stuff).

Note that there already was a discussion on this topic, but I think it's worth bringing back: https://github.com/Kotlin/kotlinx.coroutines/issues/3094

dkhalanskyjb commented 2 years ago

Sure, we recognize the problem, it's just that the API shape is currently unclear. runTest(noVirtualTime = true) { } is simply not flexible enough: it's possible that some parts of the test do need to avoid delay-skipping, but some do not. Adding it as a parameter to the test coroutine dispatcher would be a better option, like

withContext(StandardTestDispatcher(scheduler = testScheduler, skipDelays = false)) {
  // do whatever
}

But then, there's not much reason to use the test dispatchers at all.

We would prefer to take our time with this because there are other, more intrusive aspects of the test API that would influence the API of this functionality—like mocking the IO and Default dispatchers—and it doesn't look like the issue is urgent, as there is a workaround (from the linked thread):

withContext(Dispatchers.Default.limitedParallelism(1)) {
  // delays are not skipped here
}

If there are usage patterns not covered by this, please share them.

EanLombardo commented 1 year ago

I think the issue perhaps with the current API is the naming, not the functionality.

"runTest" is such a generic name it implies that it is just "the way to do tests", StandardTestDispatcher implies this even stronger.

With such a generic name I would think think that the functionality should be generic. In some contexts skipping delays breaks correctness contracts (networking), and in others it can cause heavy workloads to overrun (polling services in a cross system test). Performing a delay when it is not needed however at worst makes a test time out or hang, which can be hard to debug sure, but those cases should be fairly rare. The name "runTest" doesn't really imply that skipping should happen by default.

I would expect runTest to by default do the same thing as runBlocking, but return a TestResult, But it would have optional parameters like, skipDelays.

With the current approach a team might very easily get use to just using runTest on all their tests, then one day when they have to write an integration test that polls services they end up DOSing their own service because they never released delays would be skipped.

Neither approach addresses the fact though that both approaches might need to be mixed. It is possible that you may want a test that skips animations, but does not skip polling. There really is no way to do this unless you wrap all of your polling delays in the mentioned workaround.

Perhaps calls to delay could be associated with some sort of tag.

delay(1000, ANIMATION)
delay(2000, CORRECTNESS)
delay(100, POLLING)

Then you could do:

runTest(skipDelays = setOf(DEFAULT, ANIMATION))
joffrey-bion commented 1 year ago

"runTest" is such a generic name it implies that it is just "the way to do tests", StandardTestDispatcher implies this even stronger. With such a generic name I would think think that the functionality should be generic. In some contexts skipping delays breaks correctness contracts (networking),

Haha, there it is to bite us. It reminds me of our discussion with @dkhalanskyjb on semantic precision in https://github.com/Kotlin/kotlinx.coroutines/issues/3050#issuecomment-989618200.

A more semantically precise name would be runSuspendBlockSkippingDelays, which is the behavior that the people using the function are usually interested in.

And I agree with you @EanLombardo, runTest should have just been running tests on suspend functions, NOT skipping delays. And we would need another function or parameters to control delay skipping.

I think the issue is that historically this library was JVM-only, so the only point to use runBlockingTest over runBlocking was to benefit from delay-skipping. For some reason this idea stayed when making the lib multiplatform and naming things, even though in a multiplatform setup we needed runTest because of the lack of runBlocking - we had no option, so it really wasn't about delay-skipping primarily anymore, not the "behavior that the people using the function are usually interested in".

dkhalanskyjb commented 1 year ago

In some contexts skipping delays breaks correctness contracts (networking), and in others it can cause heavy workloads to overrun (polling services in a cross system test).

Could you elaborate? If you're doing networking, why are you not doing it in Dispatchers.IO? If you are doing it in Dispatchers.IO but choose to mock it with a delay-skipping dispatcher, you should mock the network calls as well, as otherwise, delay-skipping and single-threaded execution probably doesn't make sense. Similarly, in a cross-system test, why are machines communicating from coroutines with delay-skipping dispatchers?

runTest itself adds delay skipping only to the coroutine it creates, it doesn't magically cause all the delays in the whole program to be skipped. If you propagate delay-skipping everywhere, that's on you.

The name "runTest" doesn't really imply that skipping should happen by default.

runTest implies that the code inside will be run in a test environment, and that's it. What this entails is intentionally ambiguous. For example, with the next release, runTest will likely also attempt to catch the unreported exceptions flying through the system: https://github.com/Kotlin/kotlinx.coroutines/pull/3449. If the current runTest was called runTestWithDelaySkipping and runTest was just a multi-platform runBlocking, then, by this logic, we would have to add runTestWithCatchingExceptions and probably runTestWithDelaySkippingAndCatchingExceptions. Maybe there will be further enhancements to runTest if we notice that it can by default fix some typical hurdles that people encounter when writing tests, this is very hard to predict. One such hurdle that we tried to address with delay skipping was that, when running tests, everything is typically mocked and it doesn't make sense for a test to actually wait for a "network response" that is actually a predefined value. So, the name is chosen based on which one addresses the needs of the average test the best.

joffrey-bion commented 1 year ago

So, the name is chosen based on which one addresses the needs of the average test the best.

I guess that's where opinions diverge. Skipping delays doesn't seem to be as reasonable a default as you seem to imply. Lots of tests are surprised by it.

One such hurdle that we tried to address with delay skipping was that, when running tests, everything is typically mocked and it doesn't make sense for a test to actually wait for a "network response" that is actually a predefined value

But would it wait, though? The predefined value would likely be returned instantly, and a wrapping withTimeout would work fine. I'm not sure what "hurdle" is solved here. On the other hand, having delay-skipping by default, as a test writer I now have to be aware of any code using withTimeout to wrap something that is using a regular dispatcher further down the line:

@OptIn(ExperimentalCoroutinesApi::class)
class CoroutineTest {

    @Test
    fun test() = runTest {
        regularSuspendFun() // fails instantly by default
    }

    private suspend fun regularSuspendFun() {
        withTimeout(1000) {
            someRealIO()
        }
    }

    private suspend fun someRealIO() = withContext(Dispatchers.IO) {
        Thread.sleep(100)
    }
}

All I'm saying is that this is surprising by default, to many people.

If the current runTest was called runTestWithDelaySkipping and runTest was just a multi-platform runBlocking, then, by this logic, we would have to add runTestWithCatchingExceptions and probably runTestWithDelaySkippingAndCatchingExceptions.

No, because catching exceptions is likely to be a facility that is welcome by default. My argument is that delay-skipping is not unanimously accepted as a desirable default, unlike other things you seem to have in the pipe for runTest.

dkhalanskyjb commented 1 year ago

Lots of tests are surprised by it.

this is surprising by default, to many people.

Depends on the problem area and one's test infrastructure, I guess. In general, this issue is getting way less attention than the other ones: https://github.com/Kotlin/kotlinx.coroutines/labels/test

But would it wait, though?

In the case you show, no, there's no waiting. But it's also valuable to test that timeouts do trigger, like (pseudocode):

@Test
fut testLongResponsesTimingOut() = runTest {
  getServerResponse(alwaysSilentServer)
  assertUiStateIs<UnsuccessfulConnection>()
}

suspend fun getServerResponse(server: Server) {
  drawASpinner()
  withContext(Dispatchers.IO) {
    withTimeout(5.seconds) {
      server.requestResponse()
    }
  }
  updateUi(UnsuccessfulConnection())
}

No, because catching exceptions is likely to be a facility that is welcome by default.

Oh, I am certain we're even going to break someone's tests that way and they will come to us and complain that they were fully aware of exceptions in GlobalScope being lost and wanted that behavior anyway. From their point of view, runTest must only run coroutines (and possibly skip delays so that the tests run faster), not fail the code that works fine in production [unless you're on Android or some other platform that catches lost exceptions on its own].

My argument is that delay-skipping is not unanimously accepted as a desirable default

I've heard a saying that if you asked a hundred people to vote on whether all those who voted would get a free dinner, not all would agree. With an audience as extensive as ours, doing anything unanimously is impossible.

So, other considerations come into play. One consideration is as follows: it is wrong for a test to perform delay skipping when it shouldn't, it is wrong for a test not to perform delay skipping when it should, and by choosing a default, we choose which mistakes are more likely to happen.

Now, judging by this very thread, it looks like the cases when delay skipping should not have happened but happened anyway are very obvious: you get an immediate timeout, or your network calls all fire in quick succession, or something like that. So, you run your test, observe the issue, write an irritated comment here, problem solved. And even when the problem was there, it was localized to a single test. Or if not, if that's some common initialization code, just fix that.

On the other hand, let's imagine that runTest doesn't skip delays by default. You write, say, delay(200.milliseconds) to mock a network call. Maybe there are ten thousand such calls throughout the test suite. So, the test suite as a whole now takes twenty seconds longer to execute than it otherwise would, which may well mean that tests take twice+ as much time to execute. In the best case, you may read some tutorials or books and discover that, hey, there are options for delay skipping! Then, you rewrite your thousands of tests, using the longer test header, add "use delay skipping" to your style guide, consistently forget to add delay skipping somewhere anyway, and are then left wondering whether the lack of delay skipping in that particular test that on its own takes five seconds was intentional.

joffrey-bion commented 1 year ago

In general, this issue is getting way less attention than the other ones

Fair enough.

With an audience as extensive as ours, doing anything unanimously is impossible.

My bad, wrong word. We obviously cannot satisfy everyone. My point is that we disagree on the "majority" in this case.

Oh, I am certain we're even going to break someone's tests that way and they will come to us and complain that they were fully aware of exceptions in GlobalScope being lost and wanted that behavior anyway

That's very fair as well. As I said, I'm not against the idea of enabling some test niceties by defaults. There is obviously a balance to strike, and the decision of enabling those features by default is not all-or-nothing. We could enable some test features by default, and some could be opt-in. I don't know the details about the exception handling feature per se, so I'm not debating this one here of course, it was merely an example. But I am discussing the amount of use cases that do need delay-skipping versus the amount that don't, which would inform the choice of default.

In the case you show, no, there's no waiting. But it's also valuable to test that timeouts do trigger

I'm 100% with you here. It's exactly the kind of test that would willingly decide that they want controlled time to generate timeouts. In this situation, you want controlled time because that's the goal of the test, so you enable whatever feature you need to control time, or use a special dispatcher for it, etc.

But that's one use case. The question is rather, for the unsuspecting regular test, who really needs delay-skipping? Which use-cases did you have in mind for delay-skipping in tests that are not about specifically testing timing?

One consideration is as follows: it is wrong for a test to perform delay skipping when it shouldn't, it is wrong for a test not to perform delay skipping when it should, and by choosing a default, we choose which mistakes are more likely to happen.

I totally agree with this approach, and I arrive to opposite conclusions.

Now, judging by this very thread, it looks like the cases when delay skipping should not have happened but happened anyway are very obvious So, you run your test, observe the issue, write an irritated comment here, problem solved. And even when the problem was there, it was localized to a single test. Or if not, if that's some common initialization code, just fix that.

That wasn't my experience. It took me a while to understand what was going on the first time I faced this (and I had already a decent experience with coroutines). It was affecting most of my tests, and in particular all the tests that were calling a real service (spawned in a sidecar container for testing purposes). The proper fix, once the cause is understood, would be to write my own runTestWithoutDelaySkipping which may or may not use runTest under the hood with a custom dispatcher (I think I struggled to fix the issue with a custom dispatcher on native-mt for my use-case due to freezing issues). That's why the question about what should be in the library pops up - if I have to write that, maybe a lot of people do too.

the test suite as a whole now takes twenty seconds longer to execute than it otherwise would, which may well mean that tests take twice+ as much time to execute. In the best case, you may read some tutorials or books and discover that, hey, there are options for delay skipping

This seems way more obvious to me than your previous point. If you have a problem of speed, it's easy to understand the relation between your delays and the speed of the test, and you would immediately look into facilities to speed things up or mock time itself. On the other hand, if you have random timeouts and crashes when you didn't use anything special, it's harder to identify why you're having this behaviour if you weren't aware of delay-skipping in the first place.

You write, say, delay(200.milliseconds) to mock a network call. Maybe there are ten thousand such calls throughout the test suite

I have trouble understanding in which case someone would want to mock using delay() in a test, especially if we assume that delay-skipping is a desirable default. That's a very strange way to mock a network call. But more importantly, a mock is test code. So why would the user use delay(200) instead of just not slowing down anything - no delay() call at all? If an actual suspension is needed, why not yield() or `delay(1)? If we generally want to skip delays, then we can just not put delays in our general mock, can't we?

Summary

So I guess we agree that:

  1. sometimes we need delay-skipping, for instance for tests that check timings/timeouts. Other use cases are to be determined.
  2. sometimes we want regular delays and in particular normal withTimeout behaviour when calling code using standard dispatchers.
  3. none of these behaviours fit all tests

We disagree on the relative impact of skipping when undesired VS not skipping when desired. IMO solving a slow test is obviously linked to time (and leads to researching things in this area), while identifying crash root causes is harder and might not lead to looking into delay-skipping.

dkhalanskyjb commented 1 year ago

So why would the user use delay(200) instead

When researching how people wrote tests with the old test framework, I saw code like this all the time. This is in order to check how parallel execution behaves. "While this happens, if it takes this long, what will the rest of the system do", stuff like that. So, yeah, timings.

IMO solving a slow test is obviously linked to time (and leads to researching things in this area)

If you know that delay skipping exists! How many people do you think would know this? There are some precedents of delay skipping, like robolectric, but not all that many at all. Why would one even expect that delay skipping for tests exists? In production, when your code is slow, you don't research "how to make time-consuming parts take zero time", you go ahead and do performance optimization. How does one even arrive at the option of writing tests some other way? In this case, someone who didn't know about the possibility of delay skipping could instead try to use CountDownLatch or some other synchronization mechanisms to ensure the expected concurrent execution that happens instantaneously, instead of writing straightforward and readable code.

joffrey-bion commented 1 year ago

If you know that delay skipping exists! How many people do you think would know this?

My point didn't require the assumption that people already know they exist. My point is that a Google search will lead to that pretty quickly:

when your code is slow, you don't research "how to make time-consuming parts take zero time", you go ahead and do performance optimization

Maybe you're right here. I am not sure though. For instance, when you are calling a real service you might research how to mock it if it's too slow. When you run tests with "sleeps"/"delays" in them, you might wonder if there is a better way, and that leads to this kind of question which is exactly what I'm expecting in my assumptions.

On the other hand, if we assume people don't know about delay skipping, it may take quite some time before realizing why their withTimeout(2.min) instantly crashes even though the real test service responds in a handful of milliseconds.

Admittedly, now that people have had the problem, searching for "coroutine withTimeout fails instantly in test" leads to this SO question (2nd result for me) which is one of the questions I'm thinking about when saying people are surprised by it. So, maybe if we wait a bit more and more people have trouble with that, it will become visible enough to not be a concern anymore 😄

dkhalanskyjb commented 1 year ago

So, maybe if we wait a bit more and more people have trouble with that, it will become visible enough to not be a concern anymore

Well, there is a concern with the solution being hacky. Without searching for this issue, one may not be able to think of Dispatchers.Default.limitedParallelism(1). Having a robust well-documented solution that is evident from the documentation would be certainly better.

I think a good API shape for this would be to have a coroutine context element named something like NoDelaySkipping. This way, delay skipping can be disabled locally:

runTest {
  withContext(NoDelaySkipping) {
    initialize()
  }
  checkAssumptions()
}

or for the whole test:

runTest(NoDelaySkipping) {
  initialize()
  checkAssumptions()
}

Also, easy to pass to some deeply-nested entities when doing mocking.

joffrey-bion commented 1 year ago

That would indeed look nice!

EanLombardo commented 1 year ago

Now, judging by this very thread, it looks like the cases when delay skipping should not have happened but happened anyway are very obvious So, you run your test, observe the issue, write an irritated comment here, problem solved. And even when the problem was there, it was localized to a single test. Or if not, if that's some common initialization code, just fix that.

It being obvious is not necessarily the problem. There are some cases delay correctness is important outside of the scope of the test itself.

For example, if I am writing an end-to-end integration test that checks production, whose job it is to schedule some async work in system A, then poll system B every 30 seconds until I get the expected result, or notify someone that production is broken if that doesn't happen.

It would be nice if I could write this test using existing coroutine based client librararies, and run this as part of a normal testing system. So I pull in the client library code, write it as a Junit test, wrap the test body in runTest, execute the test, and start DOSing production unknowingly because my 30second delay is actually instant.

The scale of testing matters, not all tests are unit tests, not all test simply break the build, some of them make sure the world isn't broken. Highly generic testing utilities, like assertFailsWith, can be used for any level of test. runTest, can be dangerous for certain tests, but that is not at all clear from its name. That is the part I have a problem with.

dkhalanskyjb commented 1 year ago

So I pull in the client library code, write it as a Junit test, wrap the test body in runTest, execute the test, and start DOSing production unknowingly because my 30second delay is actually instant.

Did you actually witness projects where this could happen, or is this just a theoretical danger? In my experience, if it's possible to DDoS a production system from an incorrectly-written test, the system as a whole is dangerously brittle. All it takes is one programmer not being careful enough and forgetting a delay, or thinking that delay(1) means one second, or something. The process should protect against this. And while protection against DDoS by an intentional adversary is hard, such silly mistakes can be prevented by something as simple as https://www.nginx.com/blog/rate-limiting-nginx/, no reason not to use that.

joffrey-bion commented 10 months ago

Randomly ended up looking at this issue again, and sorry for spam, but could someone with sufficient permissions fix the 2 typos in the issue title? It's been bugging me since the beginning 😅

contlix -> kotlinx cirtual -> virtual

So: kotlinx.coroutines.test: Option to disable virtual time and delay skipping

mikedawson commented 9 months ago

I understand that there could be some cases where a virtual clock might be useful to make tests run faster. Unfortunately, this breaks a whole lot of normal coroutine use cases out of the box:

I think this is not a good default behavior. I understand that using delay within tests is not good, but there are plenty of times in real code that timeouts are needed. runTest should be useful for multiplatform testing, but I think for now I will roll my own.

JakeWharton commented 9 months ago

It does not break Turbine. We have tests to ensure virtual time skipping works correctly.

mightyguava commented 9 months ago

While delay skipping is working in Turbine, auto-advancing time causes some weird quirks that make delays very hard to test with Turbine. Or maybe I'm doing it wrong. Just filed a bug: https://github.com/cashapp/turbine/issues/268

dkhalanskyjb commented 9 months ago

Unfortunately, this breaks a whole lot of normal coroutine use cases out of the box

Supposedly, these use cases happen on Dispatchers.IO. Don't mock Dispatchers.IO with a test dispatcher in the code that doesn't need to use virtual time, and delays won't get skipped.