department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Centralize Monitoring: Provide write access for VFS backend engineers and deprecate legacy monitoring #55777

Open jhouse-solvd opened 1 year ago

jhouse-solvd commented 1 year ago

Problem Statement

VFS Teams need to be able to monitor their applications in Datadog.

However, currently:

How might we...

User Impact

Where was this problem reported?

How well do we understand the problem?

We understand the technical problem space well and have identified potential solutions. Areas that need further investigation:

Considerations

Acceptance Criteria (AC)

How should we measure success?

TODOs

batemapf commented 1 year ago

re:

The Platform has chosen Datadog as the 'single-pane-of-glass' monitoring tool but has not yet granted write permissions to VFS teams Without properly scoped permissions, VFS teams could inadvertently re-configure others' dashboards and monitors, including those needed by Platform teams for ongoing observability and monitoring Datadog lacks built-in features that allow the Platform to scope permissions (ie write access) to app- or team-specific resources, which means a solution is needed to enforce Datadog configuration standards and avoid disruption to resources in Datadog (i.e., dashboards, monitors, alerts) that the Platform depends on for day-to-day operations

it looks like you can restrict editing of monitors to specific roles (though i assume those roles are defined by the DOTS team, so maybe not practical). alternatively / additionally, alerts can be configured to trigger when a monitor is edited. might not be a perfect solution:

image.png

mchelen-gov commented 1 year ago

re:

The Platform has chosen Datadog as the 'single-pane-of-glass' monitoring tool but has not yet granted write permissions to VFS teams Without properly scoped permissions, VFS teams could inadvertently re-configure others' dashboards and monitors, including those needed by Platform teams for ongoing observability and monitoring Datadog lacks built-in features that allow the Platform to scope permissions (ie write access) to app- or team-specific resources, which means a solution is needed to enforce Datadog configuration standards and avoid disruption to resources in Datadog (i.e., dashboards, monitors, alerts) that the Platform depends on for day-to-day operations

it looks like you can restrict editing of monitors to specific roles (though i assume those roles are defined by the DOTS team, so maybe not practical). alternatively / additionally, alerts can be configured to trigger when a monitor is edited. might not be a perfect solution:

image.png

Yup, we're trying to figure out how to get the roles configured and if they can be mapped from AD or other SSO.

little-oddball commented 1 year ago

Would encourage the addition of at least 1 spike related to role decomposition and experimentation.