department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 205 forks source link

Datadog alert integration with SNOW proof of concept #74496

Closed mchelen-gov closed 9 months ago

mchelen-gov commented 9 months ago

Problem Statement

VA.gov and monitors and alerts are configured in Datadog, however all VA incident response is managed through Service Now (SNOW). We want to understand what integrations exist or are available to send data between these systems.

Impact

Success Criteria

Involved

ECC: Jeff Sleeper, Luke Bader, Swami (Prasad) Gorle Butchi OCTO: @mchelen-gov @BillChapmanUSDS @va-albers @AparnaNittalaUSDS

Relations

mchelen-gov commented 9 months ago

Proof of concept

When a monitor is triggered in Datadog, it sends an "alert" to Service Now using a webhook. The Webhook-API-yourit-va-gov webhook has been installed and configured with necessary auth.

Based on discussion with ECC/BLM, our approach was to pick a single monitor and get it fully configured with SNOW webhook, tags, and necessary information for ECC Event Management team to be able to respond to alerts.

The monitor that was selected:

Integration with SNOW is confirmed to be working, and alerts generated from Datadog can be viewed in SNOW:

Placeholder instructions for ECC:

** ECC Event Management **
Monitor is triggered when https://www.va.gov fails to load, or is missing required elements.
Screenshot of test failures can be seen here: https://vagov.ddog-gov.com/synthetics/details/u4h-eqw-qmn?monitorStatus=fail
Please report to: vagovdevops@va.gov

See Also

mchelen-gov commented 9 months ago

This completes the implementation of the proof of concept. The established monitor can now be modified as needed to suite the needs of OCTO, Platform, and ECC.

Recommended Next Steps