department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
99 stars 69 forks source link

Alert Slack when there are no registered self-hosted runners for content-build. #14492

Closed ndouglas closed 12 months ago

ndouglas commented 1 year ago

Description

h/t @olivereri

If there are no registered self-hosted runners for content-build, we will be in a bad place when we try to perform a content release.

Screenshot 2023-07-24 at 10 41 20 AM

Acceptance Criteria

olivereri commented 1 year ago

Implementation Details

It doesn't appear that Datadog has tight integration with Github Actions yet. Nor is it known whether that integration would have insight into self-hosted runner status. However a Datadog HTTP monitor that can parse JSON should be able to determine whether there are registered runners:

curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer <YOUR_TOKEN>" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/department-of-veterans-affairs/content-build/actions/runners

The response contains an attribute total_count that can be useful here:

{
  "total_count": 2,
  "runners": [
    {
      "id": 23,
      "name": "linux_runner",
      "os": "linux",
      "status": "online",
      "busy": true,
      "labels": [
        {
.
.
.
teeshe commented 1 year ago

From Gihub documentation, seems the only option available to get the number of runners is from API Eric noted above. I checked if we can have Datadog monitors to make GET curl to the Github API, but there are limitation to parsing the number of available runners from json response. we can have a cron lamda function or pipeline job similar to one that triggers content build workflow that would call Github API and also have the logic to parse response and send notification to slack/pagerduty. I would sync with @olivereri to validate the approach

ndouglas commented 1 year ago

Have you tried this?

Screenshot 2023-11-30 at 11 50 19 AM

This appears to work for me.

teeshe commented 1 year ago

This approach works. Thank you @ndouglas

teeshe commented 12 months ago

Datadog API monitor is used to monitor and check the available selfhosted runners for content-build through the github API using the Github token from parameter store, conditions are setup to check "total_count" from the response body. error alert is sent if the total_count is less than 0. This was tested and alerts were seen to report to Datadog when the number of available self-hosted runners did not meet thr predefined conditions. https://vagov.ddog-gov.com/synthetics/details/npt-6c4-qa9?from_ts=1701713179053&to_ts=1701716779053&live=true https://vagov.ddog-gov.com/synthetics/details/x4v-spw-3n8?from_ts=1701791948073&to_ts=1701795548073&live=true

image.png
ndouglas commented 12 months ago

These look great! Approved.