GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 961 forks source link

[Bug]: Unable to export logs to Datadog #1210

Closed huib-coalesce closed 2 months ago

huib-coalesce commented 10 months ago

Related Template(s)

Pub/Sub to Datadog template

What happened?

Exported log messages don't appear in Datadog.

Steps to reproduce

I've used the following Terraform definition to set the infrastructure up.

main.tf

locals {
  datadog_iam_roles = [
    "roles/monitoring.viewer",
    "roles/compute.viewer",
    "roles/cloudasset.viewer",
    "roles/browser"
  ]
  dataflow_iam_roles = [
    "roles/dataflow.admin",
    "roles/dataflow.worker",
    "roles/pubsub.viewer",
    "roles/pubsub.subscriber",
    "roles/pubsub.publisher",
    "roles/storage.objectAdmin"
  ]
}

# Service Account for Datadog integration with Google Cloud Platform
resource "google_service_account" "datadog_service_account" {
  project      = var.project_id
  account_id   = "${var.short_name}-datadog-viewer"
  display_name = "${var.short_name}-datadog-viewer"
}

resource "google_project_iam_member" "datadog_iam" {
  for_each = toset(local.datadog_iam_roles)

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.datadog_service_account.email}"
}

# Generate Datadog service account from https://app.datadoghq.com/integrations/google-cloud-platform
resource "google_service_account_iam_member" "token-creator-iam" {
  service_account_id = google_service_account.datadog_service_account.id
  role               = "roles/iam.serviceAccountTokenCreator"
  member             = "serviceAccount:dd-abc@xyz.iam.gserviceaccount.com"
}

# Log Router Sink
module "log_export" {
  source                 = "terraform-google-modules/log-export/google"
  version                = "~> 7.0"
  destination_uri        = module.destination.destination_uri
  filter                 = "severity >= INFO"
  log_sink_name          = "${var.short_name}-dd-log-sink"
  parent_resource_id     = var.project_id
  parent_resource_type   = "project"
  unique_writer_identity = true
}

# Pub/Sub Topic and Pull Subscription
module "destination" {
  source                   = "terraform-google-modules/log-export/google//modules/pubsub"
  version                  = "~> 7.0"
  project_id               = var.project_id
  topic_name               = "${var.short_name}-dd-topic"
  log_sink_writer_identity = module.log_export.writer_identity
  create_subscriber        = true
  create_push_subscriber   = false
}

# Enable the Dataflow API
resource "google_project_service" "dataflow_job_service" {
  project = var.project_id
  service = "dataflow.googleapis.com"
}

# Topic for undeliverable messages
resource "google_pubsub_topic" "dead_letter_topic" {
  name                       = "${var.short_name}-dd-dead-letter"
  project                    = var.project_id
  message_retention_duration = "86400s"
}

# Cache bucket
resource "google_storage_bucket" "dataflow_tmp_bucket" {
  name     = "${var.project_id}-dataflow-cache"
  location = var.region
  project  = var.project_id
}

# Service Account for exporting to Datadog
resource "google_service_account" "dataflow_service_account" {
  project      = var.project_id
  account_id   = "${var.short_name}-datadog-dataflow"
  display_name = "${var.short_name}-datadog-dataflow"
}

resource "google_project_iam_member" "dataflow_iam" {
  for_each = toset(local.dataflow_iam_roles)

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.dataflow_service_account.email}"
}

# echo -n "my-datadog-api-key" | gcloud secrets create datadog-api-key --data-file=- --project my-google-project
data "google_secret_manager_secret_version" "datadog-api-key" {
  secret  = "datadog-api-key"
  project = "my-google-project"
}

resource "google_project_iam_member" "dataflow_secret_iam" {
  project = "my-google-project"
  role    = "roles/secretmanager.secretAccessor"
  member  = "serviceAccount:${google_service_account.dataflow_service_account.email}"
}

# Dataflow Job using the PubSub to Datadog template
resource "google_dataflow_job" "datadog_dataflow" {
  name                  = "${var.short_name}-dd-dataflow"
  project               = var.project_id
  region                = var.region
  template_gcs_path     = "gs://dataflow-templates/latest/Cloud_PubSub_to_Datadog"
  temp_gcs_location     = google_storage_bucket.dataflow_tmp_bucket.url
  service_account_email = google_service_account.dataflow_service_account.email
  parameters            = {
    inputSubscription     = module.destination.pubsub_subscription
    outputDeadletterTopic = google_pubsub_topic.dead_letter_topic.id
    url                   = "https://http-intake.logs.datadoghq.com"
    includePubsubMessage  = true
    apiKeySource          = "SECRET_MANAGER"
    apiKeySecretId        = data.google_secret_manager_secret_version.datadog-api-key.name
  }
}

variables.tf

variable "project_id" {
  description = "Google Project id"
}

variable "short_name" {
  type        = string
  description = "Short descriptor name of the Google Project"
  validation {
    condition     = length(var.short_name) < 8
    error_message = "Please use a descriptor shorter than 8 chars"
  }
}

variable "region" {
  type        = string
  description = "The Google Project default region"
}

terraform.tfvars

project_id = "my-project-id"
short_name = "dev"
region     = "us-central1"

Result

I have a Log Router Sink.

That's connected to a Pub/Sub Topic: image

It has a Topic Subscription with a Delivery type of Pull: image

And finally, there's the Dataflow Job image

However, it appears to me that the messages are never sent to Datadog.

But there are no errors in the logs: image

And there's nothing in Datadog: image

Beam Version

Newer than 2.46.0

Relevant log output

No response

bvolpato commented 9 months ago

@huib-coalesce Sorry, missed this before.

Aren't the messages being sent to the error output? Filing a Google Cloud case with job IDs might be useful so the team can look further.

huib-coalesce commented 9 months ago

No they're not as far as I'm aware.

ConvertToDataDogEvent (left) shows data going in and out (on the right): image

Create KV pairs shows data going in, but not out: image

Write Datadog events shows no incoming data: image

WrapDatadogWriteErrors shows no incoming/outgoing data: image

FlattenErrors shows no incoming/outgoing data: image

Same for WriteFailedRecords: image

And the dev-dd-dead-letter topic doesn't show anything either: image

GurayCetin commented 9 months ago

Could you please check that Log Router Sinks is getting logs properly? I guess you are missing "roles/pubsub.publisher" role for the sink.

In my case, i have created the sink my own not with module but maybe it can refer you how to add it.

resource "google_logging_project_sink" "datadog_sink" {
  name                   = "kubernetes_container_error_logs"
  destination            = "pubsub.googleapis.com/${google_pubsub_topic.export_logs_to_datadog.id}"
  filter                 = "resource.type=k8s_container AND log_id(stderr) AND severity>=ERROR"
  unique_writer_identity = true
}
resource "google_project_iam_member" "pubsub_publisher_permisson_sink" {
  project = var.project_id
  role    = "roles/pubsub.publisher"
  member  = google_logging_project_sink.datadog_sink.writer_identity
}
huib-coalesce commented 9 months ago

Could you please check that Log Router Sinks is getting logs properly?

The Log Router Sink is receiving content: image

I guess you are missing "roles/pubsub.publisher" role for the sink

To try your suggestion, I've added:

resource "google_project_iam_member" "pubsub_publisher_permisson_sink" {
  project = var.project_id
  role    = "roles/pubsub.publisher"
  member  = module.log_export.writer_identity
}

And changed the filter from (severity >= DEBUG AND resource.type=\"cloud_function\") OR (severity >= WARNING AND resource.type=\"cloud_run_revision\") to severity>=ERROR

which ensures there are regular messages to pass through: image

The Log Router Sink:

image

The service account:

image

The Topic Metrics shows content arriving:

image

Same for the Topic Subscription:

image

The point where messages are come in, but are not going out:

image

I however did notice:

image

But I don't know the significance of that.

github-actions[bot] commented 2 months ago

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

github-actions[bot] commented 2 months ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.