avajs / ava

Node.js test runner that lets you develop with confidence 🚀
MIT License
20.72k stars 1.41k forks source link

Retry flakey tests #934

Closed Awk34 closed 6 years ago

Awk34 commented 8 years ago

I've searched around for a bit and couldn't find anything about retrying flakey tests with AVA.

I've been working on trying AVA for the generator-angular-fullstack Yeoman generator to see if it can speed up our tests. Right now in CircleCI our tests take about 15 minutes. This mainly includes running a few different configurations of the generator, and then some test command after that on the scaffolded code, namely linting, client tests, server tests, and E2E tests. The E2E tests can be quite flakey. Right now, we use Mocha, which has the ablility to add this.retry(2) to a test to retry it x times if it fails. Is there any plan to add something like this to AVA? If not, it would definitely be helpful to note some way of achieving this in the readme.

Environment

jamestalmage commented 8 years ago

I think retry is a pretty interesting idea. Lots of projects have a flaky test or two (AVA included).

Is there ever really a need for retry(3)? Could we just always use 2 and not offer a config option?

test.retry(t => {})
Awk34 commented 8 years ago

I think the number attempts needs to be configurable. For the generator, I have all the E2E tests retry twice, but the tests for the Sequelize setup is especially flakey, so I've set this to retry thrice.

edit: Could test.retry(2)(t => {}) be done? (with test.retry(t => {}) defaulting to 2)

jamestalmage commented 8 years ago

Could test.retry(2)(t => {}) be done? (with test.retry(t => {}) defaulting to 2)

Yes. I'm not necessarily opposed to it. Just trying to suggest as many alternatives as possible for discussion.

I think the number attempts needs to be configurable.

OK. What if we just allowed 2 or 3:

test.flakey(t => {}) // retry twice
test.veryFlakey(t => {}) // retry thrice

We could extend the flakey keyword to assertions:

test(t => {
  t.flakey.is(foo, bar); // test will be retried up to one more time if a `flakey` assertion fails
  t.veryFlakey.is(foo, bar); // test will be retried up to one more time if a `veryFlakey` assertion fails
  t.is(foo, bar); // if a non-flakey assertion fails - no retries
});

test.flakey(t => {
  // any assertion failure causes a retry
});

test.veryFlakey(t => {
  // any assertion failure causes up to two retries
});

For the flakey assertions, I'm thinking of some of AVA's CLI tests on Windows where we sometimes don't capture the full stdout. Those are slow tests. If the exit code is wrong we should just fail fast - because it's some other problem we don't want masked by retries. If the entire message didn't get printed (because Windows) - well then there is nothing we can do about that - retry and see what happens.


We could provide some more detailed summaries about flakey assertions as well:

5 passed
  2 flakey assertions had to be retried
  3 flakey assertions had to be retried twice
5 passed
2 failed
1 flakey assertion failed even after being retried
1 flakey assertion failed even after being retried twice
Awk34 commented 8 years ago
test.flakey(t => {}) // retry twice
test.veryFlakey(t => {}) // retry thrice

Seems reasonable, but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

We could provide some more detailed summaries about flakey assertions as well: [...]

I like that a lot. I don't think mocha has the option of showing if a flakey test was retried.

jamestalmage commented 8 years ago

but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

Isn't a test that regularly fails three times in a row basically useless? If the repeated failure is because you are running on an overcrowded CI VM, or because of network issues, aren't you just as likely to fail four times in a row at that point?

Perhaps we could make the number of retries configurable based on context:

"ava": {
  "retries": {
    "local": [0, 0],       // no retries on devs local machine
    "ci": [1, 2]           // flakey = 1 retries, veryFlakey = 2 retries on CI
  }
}

Overly complicated version:

"ava": {
  "retries": {
    "local": [0, 0],       // no retries on devs local machine
    "win32": [0, 1],       // veryFlakey retries on Windows machines - even on dev machines
    "ci": [1, 2],          // flakey = 1 retries, veryFlakey = 2 retries on CI
    "appveyor": [2, 3]     // extra retries for AppVeyor
  }
}
Awk34 commented 8 years ago

Isn't a test that regularly fails three times in a row basically useless? If the repeated failure is because you are running on an overcrowded CI VM, or because of network issues, aren't you just as likely to fail four times in a row at that point?

I completely agree; just saying, I imaging people would want it anyway.

Perhaps we could make the number of retries configurable based on context: [...]

Seems reasonable. Maybe even just something like this:

"ava": {
  "flakey": 2,
  "veryFlakey": 3
}
jamestalmage commented 8 years ago

For AVA, our "flakey" tests are basically only flakey on Appveyor. They run reliably both locally and on Travis. My concern would be allowing flakiness to creep into our Travis tests unknown to us.

Awk34 commented 8 years ago

Would there be an easy way to set some flakey config variable (in ava config or a CLI flag) from Appveyor instead of introducing some platform-specific code into your AVA config?

jamestalmage commented 8 years ago

We could do it:

  1. Via CLI:

    ava --retries=1,2
  2. via environment variables:

    set AVA_RETRIES="1,2"
  3. via config

    "ava": {
     "retries": [1, 2]
    }
Awk34 commented 8 years ago

Yeah, those seem a lot better.

jamestalmage commented 8 years ago

I think that would be the order of precedence (cli flag, environment, config).

Are environment variables even worth it? Or should we just do config and cli?

My thought was that environment variables would allow you to reuse your npm test script without modification, instead of creating a special npm run test-appveyor that was the same but for a few config settings. Especially useful if you have a build step in your pretest script.

Awk34 commented 8 years ago

If I tell my CI to run npm test and I have my script as "test": "ava somefile.js", I could easily run npm test --retries=1,2 as well, right?

I think if anyone ever comes up with the need for env var configuration then it can be discussed, but until then, nah. There's no other use of env vars like this in this project (from what I can tell).

rozzzly commented 8 years ago

@jamestalmage

Isn't a test that regularly fails three times in a row basically useless?

A fantastic point. But, IMHO there really isn't a good reason not to make it configurable... As others have pointed out, there are certainly people out there who have setups which would truly merit the option to configure this.

CLI flags / package.json fields would be a simple fix to the issue, but what about someone who'd like to have the number of retries differ from test to test? Might I suggest:

To get the default no. of retries, flakey can be accessed as a property on the chain like so

t.flakey.is(foo, bar);  // retry assertion twice
test.flakey(t => {
    // ...
}) // retry test twice

But then it could also a function that accepts an argument like retries?

t.flakey(7).is(foo, bar); // retry assertion 7 times

as far as setting a custom retry count for a test, the equivalent syntax might be:

test.flakey(7)(t -> {
    // ...
}); // retry test 7 times

But I think the (nTries: number)(test: ((t: ContextualTestContext) => void) is kind of ugly when directly in front of test(). Another option might be

test.flakey(7, 'testTitle', t => {
   // ...
});

That however is a departure from the test() syntax; it might be a little confusing. But logically, would work:

  1. if arguments.length === 3 then arguments[0] is (a number for) the retry count, arguments[1] is (a string for) the title and arguments[2] is the test method (each would be type checked of course)
  2. if arguments.length === 2 then arguments[0] is (a number for) the retry count, arguments[1] is the test method (all typechecked of course)
    • the test title would then, if possible, be inferred from the functions name as per usual.
jamestalmage commented 8 years ago

I don't think you would ever want to retry assertions, just tests.

In my mind the flakey assertion is there to differentiate which assertions are allowed to fail. As a convenience, if you use a flakey assertion, we mark the test flakey. I don't see the point in allowing t.flakey(7).is(...).

As for configuring a specific number of retries with per test granularity, I just don't see it. If we give you a global configuration for what flakey and veryFlakey mean. You really think there is some scenario where you are going to need more control than that?

rozzzly commented 8 years ago

A few more thoughts:

Now that's definitely outside of the scope of AVA; something you'd want to an external tool to manage. Such a thing could rerun failed tests when some user defined conditions are met. That could actually be really useful for testing sessions on servers where overtime, the state gets more and more muddled and scenarios , or something where the connection/resource access might be intermittent.

These were some questions I asked myself when reading this Issue. I'm not sure what use they might be to the AVA team, but I've always found its better to consider a broad range of possibilities before committing to anything. TBH, I can't think of many cases where I, personally, would be using flakey all that much, but perhaps something I said might be desired by another, or atleast give you all a different perspective.

jamestalmage commented 8 years ago

I assume retried tests would run in the same process in which they failed the first time around..

Yes. If you are writing tests that pollute state and are dependent on each-other, then we would need to rerun the whole test file again anyway. We aim to encourage atomic tests.

I'd imagine beforeEach()/afterEach() would rerun when a flakey test fails

Agreed.

rozzzly commented 8 years ago

@jamestalmage As far as retrying assertions, You were the first person to mention such a thing :unamused: I'm just following you line of thought my friend:

t.flakey.is(foo, bar); https://github.com/avajs/ava/issues/934#issuecomment-228156646

And you're right, it is pretty ridiculous; I would suspect that I'm probably not going to use flakey more than once or twice in my current projects. I merely suggested a syntax/design pattern/what-have-you which would enable users to tweak it as much as one might want in a way that is somewhat contiguous with the rest of the api.

AVA is pretty damn opinionated, and that's what has made it so successful! I certainly don't begrudge the team this; in fact, I greatly admire such adamancy!! So if you'd rather keep the api minimalistic, I totally respect that. Just throwing out ideas here man. :thought_balloon: As I described in my second comment, such a thing would probably be better suited as its own tool. Now I'm imagining ava-daemon where you could track/trigger/retry ava tests programmatically. I can think of quite a few cases in which it might be desirable to trigger a test when some condition was met. Might it not be much easier than mocking some complex state? ... (detecting a condition v.s. recreating one) _Not that I'm suggesting doing away with the core philosophy of AVA test, Just a way to retest something under conditions one to cover more edge cases.


Ah good; as it should be! Atomic tests all the way man, better to have the test to blow up :boom: then write ill-conceived tests.

jamestalmage commented 8 years ago

As far as retrying assertions, You were the first person to mention such a thing 😒

I am still proposing t.flakey.is(foo, bar);

test(t => {
  // If this assertion fails, test fails, no retries. It's basically a fail-fast assertion
  t.is(foo, bar);      

  // If this fails, entire **test** will be retried (not just the assertion).      
  t.flakey.is(bar, baz); 
});

If you do test.flakey(t => {}), then there is no fail-fast assertion. Test will be retried on any assertion failure.

novemberborn commented 8 years ago

This sounds good. @jamestalmage would you mind summarizing the design decisions? Which questions still need to be answered?

Also, 🚲 🏠 wise, is it flaky or flakey? 😉

jamestalmage commented 8 years ago

I gravitated towards flaky, but saw others were spelling it different, so I just assumed I was wrong. Turns out both spellings are correct. Not sure what to do about that. Do we alias them to mean the same thing? I'm normally against aliases, but it seems harsh to penalize either correct spelling. If we do alias, we should add a linter rule that allows people to prefer one or the other (you likely wouldn't want mixed use in the same project). Of the two, I would probably prefer flaky - it's how I naturally wanted to spell it, and it's shorter.

Awk34 commented 8 years ago

Aliasing two correct spellings seems reasonable to me, but if there's a more used spelling, that's the one that should be presented in docs.

jamestalmage commented 8 years ago

would you mind summarizing the design decisions?

  • Add two methods to the test chain:
  • test.flaky(t => {}) - If the test fails it will retry one more time.
  • test.veryFlaky(t => {}) - If the test fails, it will retry two more times.
  • Add two configuration options, that allow you to adjust how many times it gets retried.
  {
    "ava": {
      "flaky":  1,  // retry once
      "veryFlaky": 2  // retry twice
    }
  }

IMO, if they only configure flaky. veryFlaky should default to flaky + 1.


Which questions still need to be answered?

  • Is it flakey or flaky? (Both are correct, flaky seems to be more prevalent).
  • Do we alias the less prevalent spelling? Either way, only the more prevalent should appear in the docs. If we do alias, we should create a linter rule to allow users to enforce a particular spelling.
  • Do we allow for veryFlaky assertions? If so, how do we handle a scenario where the first test run has a flaky failure, the second has only a veryFlaky failure? I think you just tap out at veryFlaky max retries.
novemberborn commented 8 years ago

Is it flakey or flaky? (Both are correct, flaky seems to be more prevalent).

I too prefer flaky because it's shorter. Also: Scotch whisky 😉

Do we alias the less prevalent spelling? Either way, only the more prevalent should appear in the docs. If we do alias, we should create a linter rule to allow users to enforce a particular spelling.

We could fail with a helpful error message if flakey is used.

Do we allow for veryFlaky assertions? If so, how do we handle a scenario where the first test run has a flaky failure, the second has only a veryFlaky failure? I think you just tap out at veryFlaky max retries.

I don't see the point. If it's going to be flaky it's going to be flaky. Configuring the number of retries is fine, but distinguishing levels of flakiness seems overkill. Let's start with just flaky.

jamestalmage commented 8 years ago

I don't see the point. If it's going to be flaky it's going to be flaky. Configuring the number of retries is fine, but distinguishing levels of flakiness seems overkill. Let's start with just flaky.

Are you suggestiong removing veryFlaky on assertions (t.flaky.is(...), etc), or eliminating it from the test API as well: test.flaky(t => {...})?

See https://github.com/avajs/ava/issues/934#issuecomment-228119958.

sindresorhus commented 8 years ago

I think we should go with flaky, but no aliasing. Instead just throw an helpful error message: "Did you mean flaky?" if they write flakey.

I also don't see the point of veryFlaky. Let's keep it simple.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test. I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

Regarding a global --flaky config.

Seems reasonable, but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

We tend to not add things for imaginary use-cases. Can't we just wait until some actual use-cases come in? I think 2 retries is a very good default, and I can't see why anyone would actually need to change that.

I much prefer the "keep it simple until someone complains" methodology.

novemberborn commented 8 years ago

Are you suggestiong removing veryFlaky on assertions (t.flaky.is(...), etc), or eliminating it from the test API as well: test.flaky(t => {...})?

Both.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test.

You could write a flaky assertion as a way to sanity check whether your test should be able to pass. E.g. a method that sometimes returns null, using t.flaky.true(result !== null) would be the sanity check, after which other assertion failures are actual test failures that don't require the test to be rerun.

jamestalmage commented 8 years ago

You could write a flaky assertion as a way to sanity check whether your test should be able to pass

Yes, that is what I was thinking. It makes it easier to be explicit about what is allowed to be flaky. In that case though, you probably should write two or more tests. A normal test that tests only the non-flaky part, and a flaky test for the assertions that are.

jamestalmage commented 8 years ago

I think 2 retries is a very good default, and I can't see why anyone would actually need to change that.

@Awk34 mentioned he has a mocha test that needs to be rerun three times in CI to prevent failing the build. veryFlaky was my attempt at a compromise that didn't end up with an API like:

test.retry(3)(t => {
  // ...
});

We could eliminate veryFlaky, but then I think allowing --flaky config becomes more important.

I think allowing configuration is valuable if you've got specific platforms where a test is flaky (looking at you AppVeyor!!). I would probably set up AVA's own tests something like this:

{
  "ava": {
    "flaky": 0
  }
  "scripts": {
    "test": "ava",
    "test-appveyor": "ava --flaky=1"
  }

This would enforce that we only allow tests to be flaky on AppVeyor.

Still, there are easy workarounds for this as well:

import test from 'ava';

// only allow flaky on AppVeyor
const flaky = process.env.APPVEYOR ? test.flaky : test;

flaky(t => {
  // ...
});
novemberborn commented 8 years ago

You could write a flaky assertion as a way to sanity check whether your test should be able to pass

Yes, that is what I was thinking. It makes it easier to be explicit about what is allowed to be flaky. In that case though, you probably should write two or more tests. A normal test that tests only the non-flaky part, and a flaky test for the assertions that are.

My point was that you needed to test the result of a flaky function. There wouldn't be any non-flaky part to test without first verifying that the function wasn't flaky to begin with.

Taken to an extreme you could argue you don't need test.flaky() if you can write flaky assertions.

sindresorhus commented 8 years ago

mentioned he has a mocha test that needs to be rerun three times in CI to prevent failing the build.

More than 2 retries feels just too flaky. If you really need that you should rethink your tests. I see flaky as giving the computer a couple more chances to do the right thing. If it can't manage that after two additional tries, I don't think it can ever reliably. We shouldn't base our decisions on one person's needs. Let's wait on more use-cases.

I think allowing configuration is valuable if you've got specific platforms where a test is flaky (looking at you AppVeyor!!)

IMHO a test is either non-flaky on all systems, or flaky. If a test is flaky on AppVeyor, it's flaky in nature and should be rethought, or be marked as flaky locally too.

Awk34 commented 8 years ago

@sindresorhus I think it's actually 2 tries for my E2E, and 3 tries with Sequelize tests (not retries). Anyhow, the Sequelize tests for my project are too flakey. I don't really use SQL too often, so I've been being lazy about looking into if I can fix those XD.

Also as a side note, the reason this story came up is because I'm looking into using AVA for my Yeoman generator, another project you have a hand in. If you're interested: https://github.com/angular-fullstack/generator-angular-fullstack/tree/perf/gen-tests :D

novemberborn commented 7 years ago

I think we should:

I'm not in favor of test.flaky. Users should be encouraged to determine exactly what is making their test flaky.

vadimdemedes commented 7 years ago

As for me, I'm opposed to t.flaky assertions in favor of test.flaky for the exact same reasons by @novemberborn and @sindresorhus :

Users should be encouraged to determine exactly what is making their test flaky.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test. I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

novemberborn commented 7 years ago

@vdemedes I'm in favor of t.flaky actually 😉

vadimdemedes commented 7 years ago

@novemberborn Yeah, but your reason fits my favorness of test.flaky as well 😆

novemberborn commented 7 years ago

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test.

That might not always be possible.

I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

Of course.

taylor1791 commented 7 years ago

It is unclear to me if there is a consensus on how to implement this feature. If there is a consensus, I would like to take a stab at implementing this next week.

vadimdemedes commented 7 years ago

I just realized something. We could skip adding test.flaky in favor of using "retry" modules, like https://github.com/zeit/async-retry:

test('retry me', retry(async () => {
  // retry the test, if it throws
});

I like this solution, because that way we avoid adding more stuff to the API and we let user handle their unique use cases.

What do you think?

jfmengels commented 7 years ago

I like the idea @vadimdemedes. But I'm not sure that would always work out nicely. I'm thinking of t.fail() which will fail the test anyway.

let failed = false;
test('retry me', retry(async t => {
  if (!failed) {
    failed = true;
    t.fail();
  }
});

This should retry the test and it should pass the second time, but IIRC calling t.fail() will forcefully fail the test whatever happens next.

That said, maybe the retry (maybe some specific ava-retry module then?) module could pass a modified t object whose t.fail() method is modified and won't forcefully fail the test?

novemberborn commented 7 years ago

Decorating t seems brittle. I'd rather we do this properly.

test.flaky is probably easier to implement, so we could start there. I still like t.flaky as a way to force you to really think about what is flaky about your tests, but perhaps I'm overthinking it.

bbiagas commented 7 years ago

I am just a random person on the internet, but if I can throw in my two cents I think a configurable number of retries would be great. The opinions against it have merit, but I also think there's not a clear definition of what "flaky" means. A flaky test is (in theory) one that passes almost all of the time but occasionally doesn't, for reasons which are not obvious. If a failure turns out to be consistent or obvious for whatever reason then the test isn't flaky; it's just a failing (or poorly written) test.

Actual flaky tests are often symptoms of deep or intermittent issues. The "flake" is telling you something useful you should be inspecting, And, sometimes, looking into these tests means you may want, or need, to rerun them many, many times to try to recreate the circumstances which caused it to fail.

I have a real world example that happened at work recently; I am using Ava to test an Electron app with Spectron (webdriverio). I found that a particular test would fail every now and then for reasons we couldn't explain. It turned out the test was catching a serious (for us) but rare issue our users had reported to us only once or twice before, and a bug in the application code was the culprit. But we needed to run that test hundreds of times and build a better histogram before we could make that connection, determine the magnitude of the problem, and isolate the cause. A convenient way to keep retrying would have been great.

I might go one step further and say it would actually be handy to retry ANY test a configurable number of times. What better way to ensure an actual flaky test has been fixed than to be able to easily run it lots of times instead of hoping you catch it in CI again. Or test your tests a little more and maybe catch it sooner. This is probably not practical, but I would use it sometimes.

I would also suggest that the term used is "retry" or something similar, and not "flaky". There is some baggage among test professionals that comes with the word "flaky", the least of which is that it's not immediately clear what it means.

Just a thought. We're really liking Ava at work so thank you all for your continued hard work and dedication.

novemberborn commented 7 years ago

Hi @bbiagas thanks for your comment.

I think the scenario you ran into is probably best solved by reproducing the flaky test in a small script that you can run in a loop.

What I'm advocating for here is a way to deal with tests that are flaky due to circumstances out of their control, and ideally not bugs. It's helpful if they're retried once or twice, because you can't always fix those circumstances, but not so often that you end up ignoring bugs. I suppose it's the difference between "ugh CI failed again because so-and-so timed out, let's just restart it" and "ugh this test fails a lot, I should look into that". It's a tough balance to strike though.

eirslett commented 6 years ago

Was there any progress on this issue? I'm trying to run selenium tests with ava, and the lack of retry support is painful. (browser integration tests are the most unstable tests...) I'm thinking mocha is maybe a better test runner than ava for selenium, since mocha has retry support?

From the thread, it looks like everybody agrees that retry support would be good, but a lot of ideas about how it should be implemented so it covers every use case under the sun... in the spirit of "don't let perfect be the enemy of good" it would be useful to just have a bare-bones retry implementation similar to mocha's. Or provide the opportunity for end users to implement their own custom retry logic, like TestNG provides a RetryAnalyzer interface to implement?

t.retry(3) <-- simple version

t.retry(data => data.failureCount < parseInt(process.env.RETRY_COUNT)) <-- tailored

sholladay commented 6 years ago

@eirslett right now, Intern is the best test framework for WebDriver tests. They have put a ton of effort into making them more reliable with Leadfoot and DigDug (both ship with Intern). I have sponsored some features over there, through SitePen, for a big project that needed extensive WebDriver tests. And I, too, wanted retry because we had flaky tests. Turns out wallpapering over them isn't such a good idea. 😃

If retries were to be added to AVA, I definitely think t.flaky() is better by a long shot. It localizes the effect. And more importantly, it would better equip AVA to help you deal with flakiness.

t.flaky() could be much more interesting than a simple retry.

In short, I don't think t.flaky() should have the goal of getting your test to pass. Quite the contrary, I think it should help you figure out when and how it fails. In fact, perhaps t.flaky() should intentionally run your operation dozens or hundreds of times until it fails and then skip cleanup, or log if it never fails.

novemberborn commented 6 years ago

Very interesting suggestions @sholladay!

At this stage we're looking for somebody willing to champion this feature. It's not a priority for the core team.

novemberborn commented 6 years ago

https://github.com/avajs/ava/issues/1692 contains a proposal for tests to try some assertions, and if necessary discard the results. This is pretty close to the retry behavior talked about here.

Intriguingly the retry logic can be built on top of this proposal, in a third-party module. I'm closing this issue in favor of that approach.

If you want to help out please drop by #1692. If you're writing such a third-party module let us know and we can give feedback and promote it.

Thanks all for the discussion in this thread. There's plenty of ideas for somebody to tackle.