Closed spectranaut closed 3 years ago
The global timeout is a "stop tests in case they're stuck" kind of timeout.
I suppose we could interpret that as "stop tests in case they're stuck, but don't worry for the next 30 seconds since the test will fail if it's stuck anyway".
The workers will have to send a message to the main process whenever a per-test timeout starts, with the timeout duration. And then in the main process we'd have to make sure we don't stop tests because they're stuck until that duration has passed.
This should work with multiple t.timeout()
calls, of course.
I don't think it's quite a one-liner, but this approach is consistent with what the global timeout is there for, so let's do it.
How is described behavior different from removing the global timeout and setting a default timeout on every test?
Basically,
t.timeout(10000)
The net effect is the same IMHO.
When tests run concurrently, within and across worker processes, their duration may vary wildly. The global timeout (misnamed as it may be) is only concerned with making sure something is happening in your test suite.
Individual timeouts can be useful in certain cases, but far from all.
Disclaimer: I have now found my peace with the nested timeout and can live with the timeout change.
However, it seems my initial concerns are being experienced/reported by others. Hopefully, the example below provides an insight.
The output below is meant to be slightly hyperbolic, but convey the confusion I sensed with the timeout change on first blush.
$ npx ava dist/sdk/test/*.js
â ´ Server listening at http://127.0.0.1:54369
✖ Timed out while running tests
3 tests were pending in dist/sdk/test/01_core-wait-for-registry.js
◌ 01_core-wait-for-registry › inProgress
◌ 01_core-wait-for-registry › isErrored
◌ 01_core-wait-for-registry › isAlive
18 tests were pending in dist/sdk/test/01_core.js
◌ 01_core › waitFor server-state tests
◌ 01_core › getAllProviderMetadata !refresh
◌ 01_core › getAllProviderMetadata
◌ 01_core › getProviderMetadata aws
◌ 01_core › getRelativePath 0.0.2-rc1
◌ 01_core › getRelativePath abra
◌ 01_core › getRelativePath git-url#0.0.2-rc1
◌ 01_core › getRelativePath git-url#hash
◌ 01_core › getRelativePath file#hash
◌ 01_core › getRelativePath non-existant version
◌ 01_core › getRelativePath non-existant hash
◌ 01_core › getProviderMetadata aws w/ refresh
◌ 01_core › getProviderMetadata bad provider
◌ 01_core › getProviderMetadata github@0.0.1
◌ 01_core › getProviderMetadata aws@1.23.0 (non-existant version)
◌ 01_core › publish invalid type
◌ 01_core › shutdown server - work from cache after
◌ 01_core › getProviderMetadata github from cache
This output is completely useless in diagnosing any problem that might exist. I know there is a timeout, I just have no idea where to look for the cause.
The only way to debug this:
npx ava dist/sdk/test/*.js -T 30000
npx ava dist/sdk/test/*.js -T 120000
and so on, with the hope that it wasn't an intermittent failure.
As a thought experiment, with per test timeouts, the output could look like:
3 tests were pending in dist/sdk/test/01_core-wait-for-registry.js
◌ 01_core-wait-for-registry › inProgress
◌ 01_core-wait-for-registry › isErrored
◌ 01_core-wait-for-registry › isAlive
18 tests were pending in dist/sdk/test/01_core.js
✖ 01_core › waitFor server-state tests - Timed out after 10 seconds
◌ 01_core › getAllProviderMetadata !refresh
◌ 01_core › getAllProviderMetadata
◌ 01_core › getProviderMetadata aws
◌ 01_core › getRelativePath 0.0.2-rc1
◌ 01_core › getRelativePath abra
◌ 01_core › getRelativePath git-url#0.0.2-rc1
◌ 01_core › getRelativePath git-url#hash
◌ 01_core › getRelativePath file#hash
◌ 01_core › getRelativePath non-existant version
◌ 01_core › getRelativePath non-existant hash
◌ 01_core › getProviderMetadata aws w/ refresh
◌ 01_core › getProviderMetadata bad provider
◌ 01_core › getProviderMetadata github@0.0.1
◌ 01_core › getProviderMetadata aws@1.23.0 (non-existant version)
◌ 01_core › publish invalid type
◌ 01_core › shutdown server - work from cache after
◌ 01_core › getProviderMetadata github from cache
In most cases, that simple fault isolation in the middle,
✖ 01_core › waitFor server-state tests - Timed out after 10 seconds
will result in a substantial quality of life improvement.
I appreciate that schedulers managing workers is a serious and complex problem, but this nested timer business makes it harder to grok from the test writer point of view.
Hopefully this helps. I'll go back to holding my peace....
@sramam are those tests using serial
? Are they async
? It's quite possible that the list here does not actually take into account which tests started, just those which haven't completed yet.
yes - you caught that one quick!
They are serial and async.
Each test is async, because it allows multiple suites to run in parallel. Each test suite is serial, since they manipulate state on disk in a coordinated fashion - this allows me to use one test.before() to setup, overall, reducing run time by a very large factor.
yes - you caught that one quick!
They are serial and async.
Each test is async, because it allows multiple suites to run in parallel. Each test suite is serial, since they manipulate state on disk in a coordinated fashion - this allows me to use one test.before() to setup, overall, reducing run time by a very large factor.
I've opened https://github.com/avajs/ava/issues/2421 to track. I'm going to mark our conversation here as "off-topic", so it doesn't distract from what this issue is hoping to address.
I just ran into this, and it took longer than I'd like to admit to realize this was the issue. We have two tests that are much slower (~2m), while the other tests are below the default 10s. I wanted to only mark to "slow" tests to use a higher timeout, instead of needing to set t.timeout()
for every test.
Would be awesome to have this issue solved! 😄
@csvn would you be interested in taking this on?
Any news on this? @novemberborn @csvn ?
I'd be willing to draft a PR if nobody is on it yet. But honestly I still don't grok the reason behind the current implementation. Like the docs say, "Timeouts in AVA behave differently than in other test frameworks." – and I fail to understand why that's desirable.
To me:
t.timeout(20 * 1000)
on the test is highly unexpected.globalTimeout
without failing is highly unexpected.As a user of the framework, not interested in implementation details, I would expect (similar to @sramam I think):
timeout = globalTimeout = 10s
t.timeout
@novemberborn could you maybe try to shed some more light on the design decisions behind the timeout mechanisms? Sorry for being a bit slow here …
I have read now that the globalTimeout
option is to "stop tests in case they're stuck", or "handle stalled tests". But why does that not work with individual timeouts?
Individual timeouts can be useful in certain cases, but far from all.
Could you please elaborate on this one?
@stefanpl the solution outlined in https://github.com/avajs/ava/issues/2384#issuecomment-579678760 addresses your concerns. PR still welcome.
I don't think hard timeouts are necessary for most use cases. Test performance is easily influenced by the load on the executing machine.
apowerful1 has contributed $40.00 to this issue on Rysolv.
The total bounty is now $40.00. Solve this issue on Rysolv to earn this bounty.
Hey there, I came in after seeing the issue on Rysolv.
Seeing nobody else has announced of them picking this up, guess I'll give it a try.
As per my understanding after some local testing: Presently, t.timeout(ms)
allows us to set up a smaller per-test timeout within the global timeout limit and not beyond it, we need it to allow longer than the global timeout.
I'm going through the code as I write this. And #2384 comment makes sense looking at api.js
and test.js
.
@OhYash great! Let me know if I can help.
Hey @novemberborn, gave it some more time today. I could use some help actually.
So, we have this restartTimer debounce() function that has the single delay to it, which emits state change upon the timeout and adds pending workers to timedOutWorkerFiles.
Also, that it Maps the files and has workers running then.
What I'm facing problem with is that the workers are mapped to individual files. There will be multiple tests in each file, and now I can't seem to connect the workers to the Test/Runnable objects so as to get the individual timeout from.
I think you'd have to add a callback or event emitter to Test
.
Runner
creates them, so can set it up:
Runner
itself is an emitter, so you can forward the events to the main process / thread:
(I don't think this would be considered a state change, but also it's already overloaded so I guess it's fine.)
You can now connect it up to the API code, where you have access to the timeouts:
https://github.com/avajs/ava/blob/b8f5685ed5b828de884d0d72c988efe837058692/lib/api.js#L234
Hey @novemberborn, sorry for being so silent. Wasn't my intention to ghost away. Just got carried away in some stuff.
So regarding this, I've managed to get the individual timeout into api.js
, but I'm struggling to update the wait
value in debounce()
.
Having to resort to creating a fresh debounce() function when a t.timeout()
is called doesn't seem correct.
Any way I can update the debounce wait?
Besides that I think I also see an issue, kinda unrelated. Tests can run asynchronously, but timers are synchronous.
For example: Forget about the t.timeout() ; In this file one async test takes 6 secs to complete and the other takes 15. Now our test runner starts the first timer and runs both the tests so the second one has a head start before it's timer even starts. |
T | Event |
---|---|---|
0 | Timer starts, both tests start | |
6 | First test finished, timer stops. Second timer starts (10s) | |
15 | Second test finished, second timer was at 9s at the moment. Both tests have passed successfully. |
gist: https://gist.github.com/OhYash/7f34ac6701a699d5e09537b4e6a054e9
@OhYash could you share a draft PR?
@novemberborn, Linked https://github.com/avajs/ava/pull/2758
What you're trying to do
I appreciate that having a global timeout in order to catch unexpected behavior in a test and not consume too much CI resources. However, some tests take a long time intentionally, and I would like to be able to use
t.timeout()
to explicitly override the global timeouts for just one test. Right now, you can only override the global timeouts if thent.timeout()
is shorter than the global timeout. It would be nice if it overrides whent.timeout()
is longer as well.I think I am not the only one, as I found this question referring to the same problem.
Why you can't use AVA for this
There is no way to have a global timeout that is something like 10s and a timeout for a particular test that is longer, like 30s.
And maybe how you think AVA could handle this
I'm not sure how the timeouts are coded, it might be a one-liner :)