Open spalger opened 2 years ago
Pinging @elastic/kibana-operations (Team:Operations)
Broadly speaking, working on APIs differs from working on kibana app functionality - in one case you know which suites you're interested in and the other you really have no idea what might break.
Placing previously failed test runs in a separate group could certainly give faster feedback.
Suggestion - use a github check for each FTR config. This would be more granular than the suites we had before but also more meaningful.
Discussed with @mattkime and @brianseeders today, we're going to try running any FTR config that is expected to execute over 2-3 minutes in it's own worker, then all the rest of the configs in small FTR config groups (mostly FTR configs where all tests are skipped). The hope here is to reach a compromise where logs are as accessible as possible, CI can continue to scale while reducing costs, and users have a better experience because statuses will mostly be assigned to specific FTR configs and links will take you directly to the log output of that config.
I think besides watching previously failed tests, one other aspect is the addition of new tests as part of a PR, where the author has a particular interest in seeing the successful execution and maybe also the execution time. I like the idea to run many of the configs in separate workers, which allows to follow the test groups more closely.
When CI was broken up manually into CI Groups you could watch a specific CI Group which you knew included some test that you were working on and when it passed you knew your work was done. We lost the ability to do that when we moved to dynamically allocated FTR Config Groups because configs move around and are all in anonymous
FTR Configs #X/Y
groups so the only option is to wait for CI to finish completely.I have a couple ideas for how we might address this, but I'm open to suggestions:
These "separate groups" would really be group types, which are planned and automatically split up based on the expected execution time of those tests. They would often only include a single config but importantly they would report a unique status item to github and run in a separate job in Buildkite so the status of those interesting configs could be watched by PR authors.
We should be able to do just about all of this logic in the ci-stats API, but will need to update the
kibana-buildkite-library
to upload the right pipeline based on the results.Thoughts?