DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
348 stars 1.02k forks source link

EKS Fargate Logs not showing up #1407

Open dev-travelex opened 5 months ago

dev-travelex commented 5 months ago

Describe what happened: I have a couple of containers running on my EKS Fargate set-up. I see the pods on the UI with status "RUNNING". However, the logs don't show up for them. Also, for one of the containers, there's no service, it is just calling an API over the internet, hence no ports. This pod doesn't even show up on the UI.

Describe what you expected: Logs from all the containers should show in Logs tab of the pods.

Steps to reproduce the issue: This is my datadog cluster agent. I also added later containerCollectUsingFiles: true hoping this would work.

datadog:
  apiKey: ${datadog_api_key}
  clusterName: ${cluster_name}
  apm:
    portEnabled: true
    port: 8126
  logs:
    enabled: true
    containerCollectAll: true
    containerCollectUsingFiles: true
agents:
  enabled: false
clusterAgent:
  enabled: true
  env:
    - name: DD_EKS_FARGATE
      value: "true"
  image:
    name: cluster-agent
    tag: 7.49.1
    repository: public.ecr.aws/datadog/cluster-agent
  tokenExistingSecret: "datadog-cluster-agent"

This is added in deployment.yaml for the side-car DD agent running along with the application container in the pod


datadog_port:
    - containerPort: 8126
      name: traceport
      protocol: TCP
  datadog_env:
    - name: DD_SITE
      value: "${datadog_site}"
    - name: DD_EKS_FARGATE
      value: true
    - name: DD_CLUSTER_NAME
      value: "${cluster_name}"
    - name: DD_TAGS
      value: "[cluster_name:${cluster_name}]"
    - name: DD_CLUSTER_AGENT_ENABLED
      value: "true"
    - name: DD_APM_ENABLED
      value: "true"
    - name: DD_APM_NON_LOCAL_TRAFFIC
      value: "true"
    - name: DD_APM_RECEIVER_PORT
      value: "8126"
    - name: DD_PROFILING_ENABLED
      value: "true"
    - name: DD_CLUSTER_AGENT_URL
      value: https://datadog-cluster-agent.default.svc.cluster.local:5005
    - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
      value: "true"
    - name: DD_LOGS_ENABLED
      value: "true"
    - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
      value: "true"
  datadog_api_key: ${datadog_api_key}
  datadog_cluster_agent_token: datadog-cluster-agent

**Additional environment details (Operating System, Cloud provider, etc):**

Ran agent status with the CLI on the datadog agent in the pod and this is what I see

dev/my-web-687b646876-j4xcw/datadog-agent
  ----------------------------------------------
    - Type: file
      Identifier: 7ff049626c6919a68a66baa9788cfb55ba72b082dd7c4b059d4b4f7844697f51
      Path: /var/log/pods/dev_my-web-687b646876-j4xcw_a470639f-8e8a-47df-89d8-015c62a5c560/datadog-agent/*.log
      Service: agent
      Source: agent
      **Status: Error: could not find any file matching pattern /var/log/pods/dev_my-web-687b646876-j4xcw_a470639f-8e8a-47df-89d8-015c62a5c560/datadog-agent/*.log, check that all its subdirectories are executable**
        0 files tailed out of 0 files matching  
      Bytes Read: 0   
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 0
        24h Peak Latency (ms): 0

  dev/my-web-687b646876-j4xcw/my-web
  ------------------------------------------
    - Type: file
      Identifier: 7eb71db9765f446ae4e26670818e1bff8324d6a5a4e77f51fa7aa5c7df2efb85
      Path: /var/log/pods/dev_my-web-687b646876-j4xcw_a470639f-8e8a-47df-89d8-015c62a5c560/my-web/*.log
      Service: my-web
      Source: web
      **Status: Error: could not find any file matching pattern /var/log/pods/dev_my-web-687b646876-j4xcw_a470639f-8e8a-47df-89d8-015c62a5c560/my-web/*.log, check that all its subdirectories are executable**
        0 files tailed out of 0 files matching  
      Bytes Read: 0   
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 0
        24h Peak Latency (ms): 0

  Also, on the datadog cluster agent, this is what I see

`  2024-05-28 11:38:56 UTC | CLUSTER | ERROR | (pkg/util/kubernetes/apiserver/apiserver.go:654 in GetNode) | Can't get node from the API server: nodes "fargate-ip-100-68-90-23.eu-west-1.compute.internal" not found`
dev-travelex commented 5 months ago

From the docs, I also added DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE & DD_LOGS_CONFIG_DOCKER_CONTAINER_FORCE_USE_FILE, still no logs coming up.