Various small CI changes to make running tests more stable; see commits for details.
Cloud Provider Changes
Skip AKS steps on GKE, etc.
Bump on GKE cluster size, and use non-preempt for CI.
They should be cleaned up correctly by the scheduler workflow, so there's less pressure to keep things cheap in case they get orphaned
Set up the GKE config directory (the way it is set it catapult), so that kubectl can automatically refresh the access token if a build takes a long time (e.g. running CATS).
Workflow Changes
The actions need to run against the PR branch, still, so that it checks out the correct source to test against.
Move the config values generation to a shell script, because it was rather unwieldy
Various changes to bring it up to date
Add timeouts to most steps, so we don't get stuck runs taking too long that would never succeed.
Fix interaction with how we determine if we're running Eirini
Wait for DNS to be set up before running tests, as that can take a long time if there are too many entries.
Retry smoke tests if it fails; it appears that some clusters can fail the first run, for unknown reasons.
Don't try to get logs if everything succeeded; that can take ~20 minutes (due to the size), which is better spent running tests for the next PR.
Motivation and Context
Part of #1144 in trying to move our CI to GitHub Actions.
This PR is not sufficient for parity; we will still need to look at:
[ ] brain tests (this blocks closing #1144)
[ ] upgrade tests (#1147)
[ ] internetless (#1148)
[ ] ccdb rotation
How Has This Been Tested?
Been running on my GitHub fork for a few days (manually triggered); it seems to have gotten to a point where most runs succeed now. The last test run (of the whole thing, complete with internal PR) was https://github.com/mook-as/kubecf/actions/runs/369237207.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code has security implications.
Need to ensure that people don't run CI until the code has been reviewed.
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
Various small CI changes to make running tests more stable; see commits for details.
Cloud Provider Changes
kubectl
can automatically refresh the access token if a build takes a long time (e.g. running CATS).Workflow Changes
Motivation and Context
Part of #1144 in trying to move our CI to GitHub Actions.
This PR is not sufficient for parity; we will still need to look at:
How Has This Been Tested?
Been running on my GitHub fork for a few days (manually triggered); it seems to have gotten to a point where most runs succeed now. The last test run (of the whole thing, complete with internal PR) was https://github.com/mook-as/kubecf/actions/runs/369237207.
Types of changes
Checklist: