cdklabs / cdk-monitoring-constructs

Easy-to-use CDK constructs for monitoring your AWS infrastructure
https://constructs.dev/packages/cdk-monitoring-constructs
Apache License 2.0
443 stars 55 forks source link

add configurable alarms for Ec2Monitoring #515

Open gossandr opened 1 month ago

gossandr commented 1 month ago

Feature scope

EC2

Describe your suggested feature

Currently, EC2Monitoring does not support adding any alarms. and i was honestly a bit surprised at how I ultimately had to implement the alarms i wanted to implement.

I'd like to potentially take this on in a PR, but wanted to get some feedback here first (if possible)

At first, I tried to leverage the Ec2Monitoring class to get access to the metrics to use in monitorCustom. this did not work, in part because the metrics there are IMetric and not MetricWithAlarmSupport. The class does not support the StatusCheckFailed metric at all, and the other metrics are all exposed as the "wrong" type for what i need.

ultimately i got something working, but I am not sure that this is the best way:

using MonitoringFacade i create the metric factory:

    this.monitoring = new MonitoringFacade(this, 'MonitoringFacade', {
      alarmFactoryDefaults: {
        actionsEnabled: true,
        alarmNamePrefix: `${props.applicationName}-${props.stageName}`,
        action: new SnsAlarmActionStrategy({
          onAlarmTopic: monitoringTopic,
        }),
        datapointsToAlarm: 1,

      },
      metricFactoryDefaults: {
        namespace: `${props.applicationQualifier}`,
      },
      dashboardFactory: new DefaultDashboardFactory(this, 'DashboardFactory', {
        dashboardNamePrefix: `${props.applicationName}-${props.stageName}`,
        createDashboard: true,
        createSummaryDashboard: false,
        createAlarmDashboard: true,
        renderingPreference: DashboardRenderingPreference.INTERACTIVE_ONLY,
      }),
    });
    // initialize metric factory
    this.metricFactory = this.monitoring.createMetricFactory();
    // initialize dimensions map for ec2 InstanceId
    const ec2DimensionsMap: DimensionsMap = {};
    ec2DimensionsMap.InstanceId = props.ec2InstanceId;

I am using Ec2Monitoring, but I can't do much with it:

    // create the monitoring widget for the summary dashboard
    this.monitoring.monitorEC2Instances({
      ...monitorEc2Props,
    });

To then setup alarms, I use .monitorCustom():

    this.monitoring.monitorCustom({
      metricGroups: [
        /**
         * MetricGroup for the inference instance
         */
        {
          title: 'Inference Instance Health',
          metrics: [
            /**
             * CPU Utilization Metric with Alarm
             * Will alarm when CPU breaches the threshold
             */
            {
              metric: this.getEC2InstanceMetric('CPUUtilization', MetricStatistic.AVERAGE, ec2DimensionsMap),
              alarmFriendlyName: 'inference-instance-cpu-utilization',
              addAlarm: {
                Critical: {
                  threshold: 80,
                  comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
                  actionsEnabled: true,
                  datapointsToAlarm: 3,
                  evaluationPeriods: 3,
                  // missing data indicates that the instance is potentially down
                  // treatMissingDataOverride: TreatMissingData.BREACHING,
                },
              },
            },
            /**
             * StatusCheckFailed Metric with Alarm
             * will alarm when either instance status check fails for 2 consecutive data points
             * when maximum statistic is >= to 1.0
             */
            {
              metric: this.getEC2InstanceMetric('StatusCheckFailed', MetricStatistic.MAX, ec2DimensionsMap),
              alarmFriendlyName: 'inference-instance-status-check-failed',
              addAlarm: {
                Critical: {
                  threshold: 1.0,
                  comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
                  actionsEnabled: true,
                  datapointsToAlarm: 2,
                  evaluationPeriods: 2,
                },
              },
            },
          ],
        },
      ],
      addToAlarmDashboard: true,
      addToSummaryDashboard: false,
      alarmFriendlyName: 'inference-instance',

    });

I use a private method to create the metric using the metric factory:

  // create a method to get the EC2 Metrics using the metric factory
  private getEC2InstanceMetric(metricName: string, statistic: MetricStatistic, dimension: DimensionsMap) {

    const metric = this.metricFactory.createMetric(
      metricName,
      statistic,
      undefined,
      dimension,
      undefined,
      'AWS/EC2',
    );

    return metric;
  }

Any feedback appreciated