Identify recommended monitors with serverless_id tag

lym953 commented 1 month ago

Background

Right now we are using a hard-coded map to identify the recommended monitor given its serverless_id like high_error_rate:

const recommendedMonitorApiIdToId = {
  "serverless-lambda_function_invocations_are_failing": "high_error_rate",
  ...
}

This is a temporary remediation for issue: https://github.com/DataDog/serverless-plugin-datadog/issues/545 and incident 31902.

What does this PR do?

Use the tag from the Recommended Monitor API response to identify the monitor.

The response is like:

{
  "data": [
    {
      ...
        "name": "High Cold Start Rate on $functionName in $regionName for $awsAccount",
        "tags": [
          "serverless_id:high_cold_start_rate",
          "created_by:dd_sls_app"
        ],
    }
  ]
}

See full API response here: https://app.datadoghq.com/api/v2/monitor/recommended?count=50&start=0&search=tag%3A%22product%3Aserverless%22%20AND%20tag%3A%22integration%3Aamazon-lambda%22

Motivation

To retrieve and filter recommended monitors in an elegant way, so we won't need to update our own map when we add a new recommended monitor.

Testing Guidelines

Automated Testing

Passed the added test and existing tests.

Manual Testing

Steps:

Update a stack with all the 7 recommended monitors. The stack had monitors high_error_rate, timeout, high_cold_start_rate, high_throttles before the update.

datadog:
...
monitors:
  - high_error_rate:
      tags: ["team:serverless"]
  - timeout:
      tags: ["team:serverless"]
  - out_of_memory:
      tags: ["team:serverless"]
  - high_iterator_age:
      tags: ["team:serverless"]
  - high_cold_start_rate:
      tags: ["team:serverless"]
  - high_throttles:
      tags: ["team:serverless"]
  - increased_cost:
      tags: ["team:serverless"]

Run serverless deploy

Result:

All the 7 monitors have been updated or created. They all appear in Datadog App.

Additional Notes

Types of changes

[ ] Bug fix
[x] New feature
[ ] Breaking change
[ ] Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

[x] This PR's description is comprehensive
[ ] This PR contains breaking changes that are documented in the description
[ ] This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
[ ] This PR impacts documentation, and it has been updated (or a ticket has been logged)
[x] This PR's changes are covered by the automated tests
[ ] This PR collects user input/sensitive content into Datadog

lym953 commented 1 month ago

/merge

dd-devflow[bot] commented 1 month ago

:steam_locomotive: MergeQueue: pull request added to the queue

The median merge time in main is 2m.

Use /merge -c to cancel this operation!

DataDog / serverless-plugin-datadog