Open MadameSheema opened 11 months ago
Pinging @elastic/security-solution (Team: SecuritySolution)
Thanks for putting this ticket together, @MadameSheema! I'm glad we're taking steps towards making further improvements to our flaky test process. I have a couple of questions/comments.
failed-test
and when we become aware of the problem. In order to mitigate this, could we automatically skip a failed-test
reported in the on-merge
pipeline?failed-test
?@peluja1012 lots of thanks for reviewing thoroughly the ticket, here you can find my insights:
Giving Security Solution the ability to skip tests without waiting for CI is a positive change, but I wonder if we could take it one step further. This change would only save us the time it takes for CI to pass when a PR is open to skip a test. We could still potentially be causing problems for other teams during the time period between when the test is reported as a failed-test and when we become aware of the problem. In order to mitigate this, could we automatically skip a failed-test reported in the on-merge pipeline?
You are completely right about it, we could still potentially be causing problems. The proposed solution in the main ticket is a quick win since it is easy to implement and would save us time. I don't know how complex can be skipping tests automatically, that is something we need to research. Another thing we need to investigate or take into consideration is the pros of cons, for instance, sometimes failures are due to problems not related to us or network connection issues that happens just one time, skipping the test right away will lead to minimising our test coverage. But I really like the idea of automating this process and we can research to see if we can make it possible without compromising our coverage :)
Which team would we need to work with in order to automatically ping teams on slack when a test is marked as failed-test?
I've researched about this and seems to be simpler than we thought. I'll ping you on slack and I'll provide you details about it.
What is "burn functionality"?
Is a functionality integrated in Cypress by Patryk that checks which tests have been modified/introduced and executes them several times in a row. That would help us to get rid of the flaky test suite runner and make the check of flakiness in our tests part of the PR process. This is a great idea that we need to re-implement, but there is some work we need to do. For instance, we need to make sure that our tests can be executed several times in the same instance without having false failures or that process is not going to impact teams outside security solution.
Awesome writeup, I'm particularly looking forward to "Provid[ing] an extended guide of automation best practices focused on flakiness" coming to fruition.
Couple of clarifying questions:
PRs should not be merged without having rebased new changes in main first.
Is "rebasing" strictly required here? Could we say more broadly:
PRs should not be merged without having rebased on main or merging/pulling from main first.
PRs that introduces new tests or changes in tests should not be merged without having first exercised the tests in the Flaky test suite runner
Is there a way to execute a single test file utilizing the flaky test runner, or do team members only have the option to run tests at the "config file" (test group) level? I've gotten some team feedback that it would be nice to have more granularity within the flaky test runner, by being able to run specific test files as opposed to groups.
Thank you!
Is "rebasing" strictly required here? Could we say more broadly:
PRs should not be merged without having rebased on main or merging/pulling from main first.
Absolutely! Thanks for the suggestion.
Is there a way to execute a single test file utilizing the flaky test runner, or do team members only have the option to run tests at the "config file" (test group) level? I've gotten some team feedback that it would be nice to have more granularity within the flaky test runner, by being able to run specific test files as opposed to groups.
Yes it is. It is not 100% ideal because it requires a bit of manual intervention but is feasible. For your team, you must modify the line 10 for ESS tests and line 29 for serverless tests of the package.json
located in x-pack/test/security_solution_cypress`, to point to a specific spec file, for instance:
"cypress:entity_analytics:run:ess": "yarn cypress:ess --spec './cypress/e2e/entity_analytics/dashboards/enable_risk_score_redirect.cy.ts'"`
You just need to remember to revert the changes before merging the code into main :)
If you are running 200x tests, please make sure that the test groups are small enough so as not to consume too many resources/cost. A good target time would be under 1h.
For example, test group (group:cypress/security_solution_investigations x 50, group:cypress/security_serverless_investigations x 50)
is especially large and ideally needs breaking down. It failed after 10h, 9h respectively for only 50x (this is actually 8 days of parallelised runtime). This would be better split into smaller chunks.
@MadameSheema I think out of all the tickets in the "Tasks" list, by far the most valuable one for us short-term would be https://github.com/elastic/kibana/issues/174892 because it would save ~5-10 minutes every single day for people who monitor flaky tests manually, and allow to delegate the tests triage process to any team members.
Probably the 2nd one by priority would be https://github.com/elastic/kibana/issues/174902 because it would guarantee that any new or changed Cypress tests are being run through a sort of "flaky test runner" before they get merged. Currently, PR authors need to remember to do that manually, and PR reviewers need to remember to check that it's done.
Update
This ticket was created to minimize blocking other teams' PRs because of our tests. That has been achieved already by https://github.com/elastic/kibana/pull/173815.
There is still work pending to do to minimise the flakiness in our automation and to have more reliable tests.
This ticket from now on is going to track all the efforts around it.
Useful information
Flaky test vs blocker test
A flaky test is an automated test that has inconsistent behaviour, sometimes it passes, sometimes fails. In our context, a flaky test is an automated test that has failed at least one time in:
on-merge
pipelineFlaky test suite runner
executionWithout blocking the merge of a PR and without breaking the on-merge pipeline.
A blocker test is an automated test that has consistently failing in:
on-merge
pipelineFlaky test suite runner
executionBlocking the merge of a PR or breaking the on-merge pipeline.
In both cases, we are talking about failing tests.
How are we tracking failed tests?
When a test fails for the first time on the
on-merge
pipeline a github ticket is automatically created in the Kibana repository.failed-test
and the owner of the testWhen a test that is already reported on a github ticket fails again:
When the test has failed several times, kibana-operations team skips it.