DataDog / cloudformation-template

Easily set up the Datadog AWS integration using CloudFormation
Apache License 2.0
34 stars 42 forks source link

Running the AWS main organizations stackset on an account with a pre-existing AWS integration will delete it #124

Open j4mcs opened 1 week ago

j4mcs commented 1 week ago

Expected Behavior

When installing the AWS integration into an account which already has been registered in Datadog, the integration should either fail and leave the existing registration unchanged or succeed and update the existing registration with the configuration passed to the integration.

Actual Behavior

When installing the AWS integration into an account with which already has been registered in Datadog, the integration will fail and then delete the pre-existing registration

We are encountering this as we have the V1 Datadog integration configured for our older accounts but would like to use the V2 integration for all new accounts. This would require us to load the stackset for all OUs instead of manually per new account. However to do this we need a way for stack instances run in existing accounts to not fail (and rollback resources it didn't create).

Steps to Reproduce the Problem

This issue is related to https://github.com/DataDog/cloudformation-template/issues/85 but highlights broader problem with how the integration Lambda function is written

  1. Add an AWS account to datadog (manually or otherwise)
  2. Run the main organizations stackset (MainDatadogStackV2)
  3. Observe that the stackset fails with a 409 https://github.com/DataDog/cloudformation-template/issues/85
  4. After the failure, cloudformation will rollback and delete the AWS integration created in 1.

Specifications

Stacktrace

Here are the relevant cloudwatch logs

  [INFO]    2024-06-17T10:08:44.819Z    3b59cfd9-7102-4308-9130-b219a7876f06    Received Create request.
  [INFO]    2024-06-17T10:08:45.349Z    3b59cfd9-7102-4308-9130-b219a7876f06    Failed - exception thrown during processing.
 [INFO] 2024-06-17T10:08:45.350Z    3b59cfd9-7102-4308-9130-b219a7876f06    ResponseBody: 
{
    "Status": "FAILED",
    "Reason": "See the details in CloudWatch Log Stream: 2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
    "PhysicalResourceId": "2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
    "StackId": "arn:aws:cloudformation:us-east-2:XXXXXXXXXXX:stack/StackSet-MainDatadogStackV2-0528f68a-e780-48bd-bde1-1108f98fea9a/83fcaf30-2c91-11ef-adc8-06cd71ccd097",
    "RequestId": "2ea31534-a1d3-4107-ba51-9ca64f57ce2d",
    "LogicalResourceId": "DatadogAPICall",
    "Data": {
        "Message": "Exception during processing: HTTP Error 409: Conflict"
    }
}

...

[INFO]  2024-06-17T10:08:49.548Z    a75026eb-cdf8-4481-86e3-8320fa38325c    Received Delete request.
[INFO]  2024-06-17T10:08:50.667Z    a75026eb-cdf8-4481-86e3-8320fa38325c    ResponseBody: 
{
    "Status": "SUCCESS",
    "Reason": "See the details in CloudWatch Log Stream: 2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
    "PhysicalResourceId": "2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
    "StackId": "arn:aws:cloudformation:us-east-2:XXXXXXXXXXX:stack/StackSet-MainDatadogStackV2-0528f68a-e780-48bd-bde1-1108f98fea9a/83fcaf30-2c91-11ef-adc8-06cd71ccd097",
    "RequestId": "8b494c1f-7cb8-423c-ab04-3413300bca10",
    "LogicalResourceId": "DatadogAPICall",
    "Data": {
        "Message": "Datadog AWS Integration deleted successfully."
    }
}
j4mcs commented 1 week ago

A bit more context on what is happening in your lambda code: When POST calls to https://api.datadoghq.com/api/v1/integration/aws result in a 409, the response is an error object. This gets caught as an exception by the lambda handler and returned as a FAILED response. Given a GET won't return the external ID I think CREATE requests should list the existing AWS integrations for the given account_id and if there are any request a new ExternalID with the supplied configuration and return SUCCESS