cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Github Actions Runner Fails To Launch Scaleset #324

Open ben851 opened 2 months ago

ben851 commented 2 months ago

Describe the bug

SEV-2 The GitHub ARC controller fails to run if the actions runner docker image is deprecated. We did not get advance notice of this, and there was no obvious error in the existing pods.

Eventually the actions runner scaleset fills up with failed pods and no more runners will be launched.

Temporary fix:

  1. Cancel all pending actions
  2. Helmfile -e staging -l app=arc,tier=scaleset destroy
  3. Update the arc runner image in notification-terraform/aws/ecr/github_runner to latest
  4. Docker build/push
  5. Helmfile -e staging -l app=arc,tier=scaleset apply

To Reproduce

Build and push docker image with older version of GHA runner, launch an affected workflow file

Expected behavior

The workflow should complete successfully

Impact

Impact on Notify team:

We will be unable to release the application due to workflows not functioning

Additional context

There was no immediate error to be found in either the scaleset listener or the controller. Instead, I had to apply the above steps manually (except for the updating of docker image). When first run, the arc controller spins up pods. I tailed the logs on one of those pods and that is where the error stood out.

Action Items

ben851 commented 2 months ago

Created a workflow that builds and pushes the docker image to the account repository.

Working on the workflow that will deploy an individual application via helmfile.

sastels commented 2 months ago

Will circle back with Ben on this when he's back.

ben851 commented 2 months ago

Workflows created.