DataDog / serverless-plugin-datadog

Serverless plugin to automagically instrument your Lambda functions with Datadog
Apache License 2.0
96 stars 49 forks source link

Identify recommended monitors with serverless_id tag #548

Closed lym953 closed 1 month ago

lym953 commented 1 month ago

Background

Right now we are using a hard-coded map to identify the recommended monitor given its serverless_id like high_error_rate:

const recommendedMonitorApiIdToId = {
  "serverless-lambda_function_invocations_are_failing": "high_error_rate",
  ...
}

This is a temporary remediation for issue: https://github.com/DataDog/serverless-plugin-datadog/issues/545 and incident 31902.

What does this PR do?

Use the tag from the Recommended Monitor API response to identify the monitor.

The response is like:

{
  "data": [
    {
      ...
        "name": "High Cold Start Rate on $functionName in $regionName for $awsAccount",
        "tags": [
          "serverless_id:high_cold_start_rate",
          "created_by:dd_sls_app"
        ],
    }
  ]
}

See full API response here: https://app.datadoghq.com/api/v2/monitor/recommended?count=50&start=0&search=tag%3A%22product%3Aserverless%22%20AND%20tag%3A%22integration%3Aamazon-lambda%22

Motivation

To retrieve and filter recommended monitors in an elegant way, so we won't need to update our own map when we add a new recommended monitor.

Testing Guidelines

Testing Guidelines

Automated Testing

Passed the added test and existing tests.

Manual Testing

Steps:
  1. Update a stack with all the 7 recommended monitors. The stack had monitors high_error_rate, timeout, high_cold_start_rate, high_throttles before the update.
    datadog:
    ...
    monitors:
      - high_error_rate:
          tags: ["team:serverless"]
      - timeout:
          tags: ["team:serverless"]
      - out_of_memory:
          tags: ["team:serverless"]
      - high_iterator_age:
          tags: ["team:serverless"]
      - high_cold_start_rate:
          tags: ["team:serverless"]
      - high_throttles:
          tags: ["team:serverless"]
      - increased_cost:
          tags: ["team:serverless"]
  2. Run serverless deploy
Result:

All the 7 monitors have been updated or created. They all appear in Datadog App.

image image

Additional Notes

Types of changes

Check all that apply

lym953 commented 1 month ago

/merge

dd-devflow[bot] commented 1 month ago

:steam_locomotive: MergeQueue: pull request added to the queue

The median merge time in main is 2m.

Use /merge -c to cancel this operation!