DataDog / terraform-provider-datadog

Terraform Datadog provider
https://www.terraform.io/docs/providers/datadog/
Mozilla Public License 2.0
395 stars 373 forks source link

datadog_integration_aws.datadog_integration: error creating a Amazon Web Services integration: API error 502 Bad Gateway #190

Open debu99 opened 5 years ago

debu99 commented 5 years ago

Datadog provider 1.8

Error: Error applying plan:

1 error(s) occurred:

nmuesch commented 5 years ago

Hey @debu99 Thanks for reporting this issue and apologies for the delay. While I work on attempting to reproduce the issue, do you have an example terraform configuration that triggers this error to be raised? Thanks for any additional information!

nmuesch commented 5 years ago

I went ahead and attempted to create the AWS Integration based on the provider documentation here - https://www.terraform.io/docs/providers/datadog/r/integration_aws.html#example-usage and was able to successfully create the integration.

In addition to my previous note, are you still facing this issue?

nmuesch commented 5 years ago

Hey @debu99 as I wasn't able to reproduce this issue, I'll go ahead and close this for now. Please do let me know if you continue to hit an issue with this!

unthought commented 5 years ago

I have a reproduction for this problem on v2.1.0.

Here's the request (TF_LOG=1):

POST //api/v1/integration/aws?api_key=xx&application_key=xxx

Host: api.datadoghq.eu
User-Agent: Go-http-client/1.1
Content-Length: 175
Content-Type: application/json
Accept-Encoding: gzip

{
 "account_id": "XX",
 "role_name": "DatadogAWSIntegrationRole",
 "filter_tags": [],
 "host_tags": [
  "org:bla",
  "env:prod"
 ],
 "account_specific_namespace_rules": {
  "opsworks": false
 }
}

response:

2019/09/12 10:58:22 [DEBUG] Datadog API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 502 Bad Gateway
Content-Length: 107
Alt-Svc: clear
Cache-Control: no-cache
Content-Type: text/html
Date: Thu, 12 Sep 2019 13:58:22 GMT
Via: 1.1 google

<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

-----------------------------------------------------

Some context:

Terraform file (unchanged copy, this is 1-1 what used to work and now fails):

#######################################################################
# Datadog integration

# https://docs.datadoghq.com/integrations/amazon_web_services/?tab=allpermissions
# https://docs.datadoghq.com/integrations/faq/aws-integration-with-terraform/

provider "aws" {
  version = "2.20.0"
}

data "aws_ssm_parameter" "datadog_api_key" {
  name = "/datadog/dd_api_key"
}

data "aws_ssm_parameter" "datadog_app_key" {
  name = "/datadog/dd_app_key"
}

provider "datadog" {
  version = "2.1.0"

  api_key = "${data.aws_ssm_parameter.datadog_api_key.value}"
  app_key = "${data.aws_ssm_parameter.datadog_app_key.value}"
  api_url = "https://api.datadoghq.eu/"
}

locals {
  role_name = "DatadogAWSIntegrationRole"
}

resource "datadog_integration_aws" "integration" {
    account_id = "${data.aws_caller_identity.current.account_id}"
    role_name = "${local.role_name}"
    //filter_tags = ["key:value"]
    host_tags = [
      "org:${var.org}",
      "env:${var.env}"
    ]
    account_specific_namespace_rules = {
        //auto_scaling = false
        opsworks = false
    }
}

data "aws_iam_policy_document" "datadog_aws_integration_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type = "AWS"
      identifiers = [
        # grant Datadog's account access ...
        "arn:aws:iam::464622532012:root"
      ]
    }

    # ... if the external ID matches
    condition {
      test = "StringEquals"
      variable = "sts:ExternalId"

      values = [
        "${datadog_integration_aws.integration.external_id}"
      ]
    }
  }
}

# https://docs.datadoghq.com/integrations/amazon_web_services/?tab=allpermissions#datadog-aws-iam-policy
data "aws_iam_policy_document" "datadog_aws_integration" {
  statement {
    actions = [
      "apigateway:GET",
        "autoscaling:Describe*",
        "budgets:ViewBudget",
        "cloudfront:GetDistributionConfig",
        "cloudfront:ListDistributions",
        "cloudtrail:DescribeTrails",
        "cloudtrail:GetTrailStatus",
        "cloudwatch:Describe*",
        "cloudwatch:Get*",
        "cloudwatch:List*",
        "codedeploy:List*",
        "codedeploy:BatchGet*",
        "directconnect:Describe*",
        "dynamodb:List*",
        "dynamodb:Describe*",
        "ec2:Describe*",
        "ecs:Describe*",
        "ecs:List*",
        "elasticache:Describe*",
        "elasticache:List*",
        "elasticfilesystem:DescribeFileSystems",
        "elasticfilesystem:DescribeTags",
        "elasticloadbalancing:Describe*",
        "elasticmapreduce:List*",
        "elasticmapreduce:Describe*",
        "es:ListTags",
        "es:ListDomainNames",
        "es:DescribeElasticsearchDomains",
        "health:DescribeEvents",
        "health:DescribeEventDetails",
        "health:DescribeAffectedEntities",
        "kinesis:List*",
        "kinesis:Describe*",
        "lambda:AddPermission",
        "lambda:GetPolicy",
        "lambda:List*",
        "lambda:RemovePermission",
        "logs:Get*",
        "logs:Describe*",
        "logs:FilterLogEvents",
        "logs:TestMetricFilter",
        "logs:PutSubscriptionFilter",
        "logs:DeleteSubscriptionFilter",
        "logs:DescribeSubscriptionFilters",
        "rds:Describe*",
        "rds:List*",
        "redshift:DescribeClusters",
        "redshift:DescribeLoggingStatus",
        "route53:List*",
        "s3:GetBucketLogging",
        "s3:GetBucketLocation",
        "s3:GetBucketNotification",
        "s3:GetBucketTagging",
        "s3:ListAllMyBuckets",
        "s3:PutBucketNotification",
        "ses:Get*",
        "sns:List*",
        "sns:Publish",
        "sqs:ListQueues",
        "support:*",
        "tag:GetResources",
        "tag:GetTagKeys",
        "tag:GetTagValues",
        "xray:BatchGetTraces",
        "xray:GetTraceSummaries",
        // https://docs.datadoghq.com/integrations/amazon_event_bridge/
        // https://eu-central-1.console.aws.amazon.com/events/home?region=eu-central-1#/partners/datadoghq.com
        "events:CreateEventBus"
    ]

    resources = ["*"]
  }
}

resource "aws_iam_policy" "datadog_aws_integration" {
  name = "DatadogAWSIntegrationPolicy"
  policy = "${data.aws_iam_policy_document.datadog_aws_integration.json}"
}

resource "aws_iam_role" "datadog_aws_integration" {
  name = "${local.role_name}"
  description = "Role for Datadog AWS Integration"
  assume_role_policy = "${data.aws_iam_policy_document.datadog_aws_integration_assume_role.json}"
}

resource "aws_iam_role_policy_attachment" "datadog_aws_integration" {
  role = "${aws_iam_role.datadog_aws_integration.name}"
  policy_arn = "${aws_iam_policy.datadog_aws_integration.arn}"
}
nmuesch commented 5 years ago

Hey, thanks for the reproduction steps. I'll open this issue back up for now.

bkabrda commented 4 years ago

Hey @unthought, I think this issue was now fixed in the backend code - I'm no longer able to reproduce it. Could you please give it another try and let me know the result? Thanks!

msuterski commented 4 years ago

We're experiencing exactly the same problem but when creating GCP integrations.

The "failed" requests are actually creating integrations in DD, but because the response is 502, terraform is not, rightfully so, adding them to the state file.

Subsequent terraform applies return 409 error code from Datadog indicating the the resources already exist, which it does.

To fix the state, we need to manually delete the "failed" integrations in DD's UI and re-apply the terraform config. In the case listed below, we needed to go through that process multiple times to get all those integrations created correctly.

It almost feels like there is some sort of throttling on the Ddatadog API side where only 1-2 create calls would return a success. The rest would return as 502s.

$ terraform apply -lock-timeout=300s plan.tfout
--
49 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-1"]: Creating...
50 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-2"]: Creating...
51 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-3"]: Creating...
52 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-4"]: Creating...
53 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-5"]: Creating...
54 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-6"]: Creating...
55 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-7"]: Creating...
56 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-8"]: Creating...
57 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-1"]: Creation complete after 1s [id=xxx-1]
58 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-3"]: Creation complete after 1s [id=xxx-3]
59 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-6"]: Creation complete after 1s [id=xxx-6]
60 |  
61 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"******
62 |  
63 |  
64 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
65 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
66 |  
67 |  
68 |  
69 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"******
70 |  
71 |  
72 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
73 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
74 |  
75 |  
76 |  
77 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"******
78 |  
79 |  
80 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
81 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
82 |  
83 |  
84 |  
85 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"******
86 |  
87 |  
88 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
89 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
90 |  
91 |  
92 |  
93 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"******
94 |  
95 |  
96 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
97 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
98 |  
99 |  
100 | time="2020-03-06T14:58:17Z" level=fatal msg="Failed to execute a command" error="exit status 1"
jurajseffer commented 4 years ago

This happens also when describing a monitor. We have several monitors in the same TF configuration and it happens quite frequently. My suggestion would be to add retry mechanism to the provider, at least for GET operations.

Error: error checking monitor exists: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"}

phillip-dd commented 3 years ago

@jurajseffer if this is still an issue, can you open a support ticket with further details so that we can investigate the details? It seems like a different issue than the one described here.

ozgurozkan123 commented 3 years ago

I experience the same problem with DD Azure Integration on subsequent terraform apply commands. I had to manually delete Datadog integration for a successful terraform apply.

Error: error creating an Azure integration: 409 Conflict: {"errors": ["The given tenant and client already exists in your Datadog account."]}

Frogvall commented 2 years ago

Whenever a change triggers the integration to be deleted and recreated, this happens. If I rerun the terraform apply directly after, it works fine. It seems to happen directly after an integration is removed and reattached:

Error: error deleting an AWS integration Lambda ARN from https://api.datadoghq.eu/api/v1/integration/aws/logs: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.eu","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"}
Error: error attaching Lambda ARN to AWS integration account from https://api.datadoghq.eu/api/v1/integration/aws/logs: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.eu","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com"}

As someone said, it seems to be properly deleted/attached, but still answers with 502.

oblogic7 commented 2 years ago

I'm also running into this on a terraform destroy. The resource is deleted, but the API responds with 502.

Error: error disabling Amazon Web Services log collection from https://api.datadoghq.com/api/v1/integration/aws/logs/services: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"[http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"support@datadoghq.com](http://status.datadoghq.com%22%2C%22twitter%22:%22http//twitter.com/datadogops%22,%22email%22:%22support@datadoghq.com)"}