cdklabs / cdk-monitoring-constructs

Easy-to-use CDK constructs for monitoring your AWS infrastructure
https://constructs.dev/packages/cdk-monitoring-constructs
Apache License 2.0
443 stars 55 forks source link

feat(loadbalancing): add fail-open monitoring #537

Closed miloszwatroba closed 4 days ago

miloszwatroba commented 4 days ago

Fixes #532

This pull request adds fail-open monitoring to our Load Balancing monitors.

To give context about fail open (ref):

If a target group contains only unhealthy registered targets, the load balancer routes requests to all those targets, regardless of their health status. This means that if all targets fail health checks at the same time in all enabled Availability Zones, the load balancer fails open. The effect of the fail-open is to allow traffic to all targets in all enabled Availability Zones, regardless of their health status, based on the load balancing algorithm

Adding this metric will give better visibility into whether the Load Balancer's fail open routing was used during incidents. The metrics added as part of this pull request are in line with the AWS documentation:

These metrics are reported conditionally, only when they have nonzero values. Thus, I added a FILL(metric, 0) metric math to correctly represent the values on the dashboards.

Tested with NetworkLoadBalancer, as that's the setup I have on my account - the ApplicationLoadBalancer values are based only on the documentation.

Discussion point: I added metricUnhealthyRoutingCount to the ILoadBalancerMetricFactory interface, which is exported from the package. This means, that this commit theoretically introduced a breaking change for customers that extended the library with their own classes that implement this interface. How do you approach such changes in this library Should this commit be considered a breaking change and released in a new major version or is this overly cautious?


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

echeung-amzn commented 4 days ago

Should this commit be considered a breaking change and released in a new major version or is this overly cautious?

A quick skim over internal codebases and a GitHub code search doesn't seem to reveal any such usages, so I'd consider it relatively safe.

If it turns out I'm wrong and you've ended up here, please open up an issue if this is a problem for your setup.