DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.85k stars 1.2k forks source link

[BUG] Agent does not start when running on GCP Cloud Run #17935

Open mpsq opened 1 year ago

mpsq commented 1 year ago

Agent Environment Agent: 7.45.1

Describe what happened: I am attempting to run the agent on Cloud Run to forward traces from other containers, I use the following Dockerfile:

FROM datadog/agent:7

ENV DD_HOSTNAME=dd-agent \
  DD_SITE=datadoghq.eu \
  DD_API_KEY=xyz

# Features we want
ENV DD_APM_NON_LOCAL_TRAFFIC=true \
  DD_APM_ENABLED=true

# Turn everything else off
ENV DD_LOGS_ENABLED=false \
  DD_ENABLE_METADATA_COLLECTION=false \
  DD_SEND_HOST_METADATA=false \
  DD_ENABLE_GOHAI=false \
  DD_COLLECT_KUBERNETES_EVENTS=false

EXPOSE 8126

The container, once on Cloud Run, crashes with:

PROCESS | CRITICAL | (comp/core/log/logger.go:104 in Critical) | Error collecting host details: strconv.ParseInt: parsing "unknown": invalid syntax
PROCESS | CRITICAL | (comp/core/log/logger.go:104 in Critical) | Failed to initialize the process agent: error collecting host details: strconv.ParseInt: parsing "unknown": invalid syntax

Coming from: https://github.com/DataDog/datadog-agent/blob/fbc3a566b5ca593435126c8c757780d62f1dc5ba/comp/process/hostinfo/hostinfo.go#L32C1-L32C1

Describe what you expected: The Dockerfile above works fine / as expected when running locally on a Linux machine. I would expect the container to run without issues on GCP Cloud Run.

Steps to reproduce the issue: You can use this Cloud Build file:

---
steps:
  # Build the container image
  - name: "gcr.io/cloud-builders/docker"
    args:
      [
        "build",
        "-t",
        "gcr.io/${PROJECT_ID}/agent",
        ".",
      ]

  # Push the container image to Container Registry
  - name: "gcr.io/cloud-builders/docker"
    args: ["push", "gcr.io/${PROJECT_ID}/agent"]

  # Deploy container image to Cloud Run
  - name: "gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: gcloud
    args:
      [
        "run",
        "deploy",
        "agent",
        "--image=gcr.io/${PROJECT_ID}/agent",
        "--region=us-central1",
        "--allow-unauthenticated",
      ]

images:
  - gcr.io/${PROJECT_ID}/agent

# gcloud builds submit config c.yaml --region us-central1

Save this to a file named dd-agent.yaml. And then, with the gcloud cli installed and configured:

gcloud builds submit config dd-agent.yaml --region us-central1

Additional environment details (Operating System, Cloud provider, etc): GCP Cloud Run

LiuVII commented 1 year ago

We experience a similar issue. We are trying to get CloudSQL postgres DB query analysis in Datadog and wondering what's the correct config for Datadog agent running docker in GCP CloudRun and which env vars should be set beside DD_SITE and DD_API_KEY

enischiguti commented 1 year ago

I was having the same error, in my case these annotations were missing:

metadata:
  annotations:
    run.googleapis.com/launch-stage: BETA
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/execution-environment: gen2
LiuVII commented 1 year ago

@enischiguti

Very cool! Deploying CloudRun with Gen2 indeed made the deployment successful thank you!

Though I'm still combating some network issues. Can you please share a bit more details about your configuration? Which env vars did you set beside DD_API_KEY?

enischiguti commented 1 year ago

@LiuVII What is the error you're experiencing? Besides DD_API_KEY and DD_SITE the other one I needed was DD_HOSTNAME.

LiuVII commented 1 year ago

@LiuVII What is the error you're experiencing? Besides DD_API_KEY and DD_SITE the other one I needed was DD_HOSTNAME.

I don't see any specific error coming from the agent just don't see the expected metrics in Datadog dashboard.

Basically, we're trying to use CloudRun service to host Database Monitoring Agent https://docs.datadoghq.com/database_monitoring/setup_postgres/gcsql/?tab=docker#install-the-agent to get query insights in Datadog

enischiguti commented 1 year ago

Well I don't have this specific use-case. Only other thing I can think of right now is enabling debug logs to find more hints.