aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

(aws-cloudwatch): Unable to define an alarm based on anomaly detection #19286

Closed natekasprzak closed 2 years ago

natekasprzak commented 2 years ago

What is the problem?

While trying to create an alarm that uses the anomaly detection functionality I get a CloudFormation JSON that should be valid, but causes an error.

My code produces the following JSON:

[...]
"AnomalyAlarmFFF00000": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "ComparisonOperator": "GreaterThanUpperThreshold",
        "EvaluationPeriods": 1,
        "ActionsEnabled": true,
        "AlarmActions": [
          {
            [...]
          }
        ],
        "AlarmDescription": "[...]",
        "DatapointsToAlarm": 1,
        "Metrics": [
          {
            "Expression": "ANOMALY_DETECTION_BAND(mtr, 2)",
            "Id": "expr_1"
          },
          {
            "Id": "mtr",
            "MetricStat": {
              "Metric": {
                "Dimensions": [
                  {
                    "Name": "Example",
                    "Value": "Example-Metric"
                  }
                ],
                "MetricName": "MinutesSinceLastProduced",
                "Namespace": "ExampleNamespace"
              },
              "Period": 300,
              "Stat": "Average"
            },
            "ReturnData": false
          }
        ],
        "OKActions": [
          [...]
        ],
        "ThresholdMetricId": "expr_1",
        "TreatMissingData": "breaching"
      },
      "Metadata": {
        "aws:cdk:path": "[...]"
      }
    },
[...]

According to the docs when you create an alarm based on a metric math expression, specify True for this value [ReturnData] for only the one math expression that the alarm is based on. You must specify False for ReturnData for all the other metrics and expressions used in the alarm. That suggests that the listed JSON is a valid configuration. Information on anomaly detection and its use (especially with the newer Alarm class) is pretty scarce and even contradicts itself in places, so it'd be great if someone could take a look and fix it up if possible. It's not even clear if the anomaly detector (CfnAnomalyDetector) needs to be defined for the ANOMALY_DETECTION_BAND function to work.

Reproduction Steps

  const anomalyDetector = new CfnAnomalyDetector(scope, `AnomalyDetector`, {
    namespace: 'ExampleNamespace',
    metricName: 'MinutesSinceLastProduced',
    stat: 'Average',
    dimensions: [
      {
        name: 'Example',
        value: 'Example-Metric'
      }
    ]
  });

  const metric = new MathExpression({
    period: Duration.millis(300 * 1000),
    expression: `ANOMALY_DETECTION_BAND(mtr, 2)`,
    usingMetrics: {
      mtr: new Metric({
        namespace: 'ExampleNamespace',
        metricName: 'MinutesSinceLastProduced',
        dimensionsMap: {
          Example: 'Example-Metric'
        },
        statistic: 'Average'
      })
    }
  });

  const alarm = metric.createAlarm(scope, `AnomalyAlarm`, {
    alarmDescription: '',
    datapointsToAlarm: 1,
    evaluationPeriods: 1,
    comparisonOperator: ComparisonOperator.GREATER_THAN_UPPER_THRESHOLD,
    treatMissingData: TreatMissingData.BREACHING,
    actionsEnabled: true,
    threshold: 0 // Temporary value - see below
  });

  // Set a threshold based on the math expression instead of a static value
  // See: https://github.com/aws/aws-cdk/issues/10540
  (alarm.node.defaultChild as CfnAlarm).addPropertyDeletionOverride('Threshold');
  (alarm.node.defaultChild as CfnAlarm).thresholdMetricId = 'expr_1';

What did you expect to happen?

The CDK should generate an alarm that uses anomaly detection band instead of a static threshold. The generated alarm should be successfully deployed in a CloudFormation stack.

What actually happened?

The CloudFormation error:

Exactly two elements of the metrics list should return data. (Service: AmazonCloudWatch; Status Code: 400; Error Code: ValidationError; Request ID: 3026edcc-f2ed-4a5c-926b-aff96ac6b367; Proxy: null)

CDK CLI Version

2.7.0 (build cfb09d5)

Framework Version

No response

Node.js Version

12.22.10

OS

macOS 12.2.1

Language

Typescript

Language Version

TypeScript (3.9.7)

Other information

Few of the resources I came across while trying to fix the issue: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_cloudwatch.MathExpression.html https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-cloudwatch-alarm-metricdataquery.html#cfn-cloudwatch-alarm-metricdataquery-returndata https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-cw-alarm.html#aws-properties-cw-alarm--examples

pflorek commented 2 years ago

This could be the fix https://github.com/aws/aws-cdk/pull/19956

natekasprzak commented 2 years ago

@pflorek, yes, the #19956 PR seems to resolve this issue, although it is not merged yet. Since it is already being worked on I'm closing this issue, thanks for the reply.

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

moltar commented 1 year ago

@natekasprzak It appears that the #19956 PR is closed and is not updated. Shall this be reopened then?

natekasprzak commented 1 year ago

@natekasprzak It appears that the #19956 PR is closed and is not updated. Shall this be reopened then?

This PR is a proper solution to the problem, so I'm not sure if there's any value in reopening this issue as well. I'd rather see the feedback addressed in the PR and the feature merged into the CDK.

In case anyone finds it useful, this is how our team creates anomaly alarms for the time being:

new CfnAlarm(scope, 'AnomalyAlarm', {
  alarmDescription: 'Anomaly alarm',
  datapointsToAlarm: 1,
  evaluationPeriods: 1,
  comparisonOperator: ComparisonOperator.GREATER_THAN_UPPER_THRESHOLD,
  treatMissingData: TreatMissingData.BREACHING,
  actionsEnabled: true,
  alarmActions: [topicArn],
  okActions: [topicArn],
  thresholdMetricId: 'ad1',
  metrics: [
    {
      id: 'ad1',
      returnData: true,
      expression: 'ANOMALY_DETECTION_BAND(m1, 2)'
    },
    {
      id: 'm1',
      returnData: true,
      metricStat: {
        metric: {
          namespace: 'Custom',
          metricName: 'Unit',
          dimensions: [
            {
              name: 'Service',
              value: 'Service-Metric'
            }
          ]
        },
        period: 60,
        stat: 'Average'
      }
    }
  ]
});
patwcummins commented 1 year ago

@natekasprzak Is this the only way you've found to create alarms?

Correct me if i'm wrong, but is this approach limited to creating alarms for existing topics with a known ARN?

Lykr commented 1 year ago

I think we should reopen this issue to figure out a more convenient way for users to create anomaly detection alarm.

alfaproject commented 1 year ago

I also agree that this should be reopened. It's still a valid issue

dexterityzx commented 1 year ago

Are we able to create composite alarm with both Alarm and CfnAlarm as its children? If not, then using CfnAlarm to create anomaly detection is not realistic. Users will be forced to update all their L2 Alarm to L1 CfnAlarm when they want to add anomaly detection alarm into their current composite alarm.

We should reopen the ticket if this we're the case.

n9iels commented 7 months ago

Just like to point out, this is still a relevant issue when creating anomaly detection alarms. I found a way to work around this issue, basically removing the ReturnData property that is added automatically to metrics defined within a new MathExpression({...}). Ain't pretty, but it does the job.

const cfnAlarm = myAlarm.node.defaultChild as CfnAlarm;
// The '1' is refering to the array index of the relevant metric
cfnAlarm.addPropertyDeletionOverride('Metrics.1.ReturnData');
cfnAlarm.addPropertyDeletionOverride('Threshold');
sthuber90 commented 1 month ago

Another way to create an anomaly detection alarm with CDK is described here: https://aws.amazon.com/de/blogs/networking-and-content-delivery/monitoring-load-balancers-using-amazon-cloudwatch-anomaly-detection-alarms/

It's very similar to how @natekasprzak 's team does it though