Open kibanamachine opened 2 years ago
Pinging @elastic/security-asset-management (Team:Asset Management)
Looks like this test was added 13 days ago by @tomsonpl in https://github.com/elastic/kibana/pull/131224
New failure: CI Build - main
Ouch, havent seen the message from 3 weeks ago. I am on it, gonna check what is happening. Thanks for pinging me @spalger
This is strange, tests are passing locally. Do you know any other way to test this than running node scripts/functional_tests --bail --config x-pack/test/api_integration/config.ts
? Thanks!
Yeah, the test isn't failing consistently, it's flaky, which likely means there is some sort of timing issue in the way the test works. This can be tricky to find without walking through the steps of the specific test and ensuring that there aren't race conditions step by step.
Based on the failure logs from buildkite it seems that sometimes /api/fleet/package_policies
can respond without an item. This API might have actually responded with an error status code, but there isn't a status code assertion at https://github.com/elastic/kibana/blob/bf6cf59908d35b3cb98146a2a88fe1c1610110d6/x-pack/test/api_integration/apis/osquery/packs.ts#L84-L103.
What I would recommend it creating a PR which adds status code assertions to that and similar API calls here, throw a .only()
on the suite, and then run the api_intergation config in the flaky test runner 100 or so times. This might help explain the error that's occurring and how to avoid it from failing tests in the future.
That's very helpful, thanks! I will take care of this tomorrow morning!
Hey, this is still very hard to reproduce. Do you know if that FAIL happens often? Or it was just 2 times in the past month? https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/770 I tried this twice, and got good results all the time ;/
Looks like it's happening about 4 times a week across tracked branches and PRs, so it's definitely flaky enough that the next time it failed on a tracked branch it will be skipped.
Hunting for failures with the flaky test runner can be a problem with tests which aren't super flaky. I feel like you're probably better off putting in the status code assertions I was describing and all the debug info you can think of, then the next time it fails in a PR or on a tracked branch you can look into what happened. You can find the PRs which have this failure by checking out this dashboard: https://ops.kibana.dev/s/ci/app/dashboards#/view/8b7279b5-a72d-4a03-a480-fcc970f16305?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d%2Fd,to:now))
Just search for the name of the test to see the visualization I took a screenshot of and find out which jobs have failures you can inspect
Alright, thank you @spalger :)
Thank you @spalger , I merged the PR https://github.com/elastic/kibana/pull/134881 with assertion and a console.log just in case. I will continue to investigate. Sorry if this caused you and inconvenience 👍
New failure: CI Build - main
New failure: CI Build - 8.4
New failure: CI Build - main
New failure: CI Build - main
Skipped.
main: 7cae4515534
A test failed on a tracked branch
First failure: CI Build - 8.3