DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.81k stars 1.18k forks source link

The kubernetes.memory.limits is incorrect when K8s cluster is in KIND(Kubernetes IN Docker) environment #10508

Closed yummydsky closed 2 years ago

yummydsky commented 2 years ago

Output of the info page (if this is a bug)

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
Defaulting container name to agent.
Use 'kubectl describe pod/datadog-agent-7xwvf -n datadog-monitoring' to see all of the containers in this pod.
2022-01-12 09:24:01 UTC | CORE | WARN | (pkg/util/log/log.go:630 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
2022-01-12 09:24:01 UTC | CORE | INFO | (cmd/system-probe/config/config.go:119 in Merge) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
Getting the status from the agent.

===============
Agent (v7.32.4)
===============

  Status date: 2022-01-12 09:24:01.793 UTC (1641979441793)
  Agent start: 2022-01-05 06:08:02.119 UTC (1641362882119)
  Pid: 1
  Go Version: go1.16.7
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 5.779ms
    System time: 2022-01-12 09:24:01.793 UTC (1641979441793)

  Host Info
  =========
    bootTime: 2021-09-07 02:27:11 UTC (1630981631000)
    kernelArch: x86_64
    kernelVersion: 5.8.0-63-generic
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.04
    procs: 17
    uptime: 2883h41m2s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    host_aliases: [my-k8s-695-worker2-my-k8s-695]
    hostname: my-k8s-695-worker2-my-k8s-695
    socket-fqdn: datadog-agent-7xwvf
    socket-hostname: datadog-agent-7xwvf
    host tags:
      cluster_name:my-k8s-695
      kube_cluster:my-k8s-695
      kube_cluster_name:my-k8s-695
    hostname provider: container
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

  Metadata
  ========
    hostname_source: container

=========
Collector
=========

  Running Checks
  ==============

    containerd
    ----------
      Instance ID: containerd [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/containerd.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 625, Total: 25,693,112
      Events: Last Run: 2, Total: 95
      Service Checks: Last Run: 1, Total: 41,103
      Average Execution Time : 210ms
      Last Execution Date : 2022-01-12 09:23:47 UTC (1641979427000)
      Last Successful Execution Date : 2022-01-12 09:23:47 UTC (1641979427000)

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 9, Total: 369,920
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:23:54 UTC (1641979434000)
      Last Successful Execution Date : 2022-01-12 09:23:54 UTC (1641979434000)

    cri
    ---
      Instance ID: cri [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cri.d/conf.yaml.default
      Total Runs: 41,104
      Metric Samples: Last Run: 33, Total: 1,356,636
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 19ms
      Last Execution Date : 2022-01-12 09:24:01 UTC (1641979441000)
      Last Successful Execution Date : 2022-01-12 09:24:01 UTC (1641979441000)

    disk (4.4.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 448, Total: 18,417,680
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 59ms
      Last Execution Date : 2022-01-12 09:23:53 UTC (1641979433000)
      Last Successful Execution Date : 2022-01-12 09:23:53 UTC (1641979433000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 41,104
      Metric Samples: Last Run: 5, Total: 205,520
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:24:00 UTC (1641979440000)
      Last Successful Execution Date : 2022-01-12 09:24:00 UTC (1641979440000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 234, Total: 9,617,940
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:23:52 UTC (1641979432000)
      Last Successful Execution Date : 2022-01-12 09:23:52 UTC (1641979432000)

    kafka_consumer (2.12.1)
    -----------------------
      Instance ID: kafka_consumer:49da92eb4b468986 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.yaml
      Total Runs: 39,630
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 5.389s
      Last Execution Date : 2022-01-12 09:23:40 UTC (1641979420000)
      Last Successful Execution Date : Never
      Error: NoBrokersAvailable
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 992, in run
          initialization()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 87, in _init_check_based_on_kafka_version
          self.sub_check = self._make_sub_check()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 115, in _make_sub_check
          kafka_client = self.create_kafka_client()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 62, in create_kafka_client
          return self._create_kafka_client(clazz=KafkaClient)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 139, in _create_kafka_client
          return clazz(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/kafka/client_async.py", line 244, in __init__
          self.config['api_version'] = self.check_version(timeout=check_timeout)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/kafka/client_async.py", line 927, in check_version
          raise Errors.NoBrokersAvailable()
      kafka.errors.NoBrokersAvailable: NoBrokersAvailable

    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 30,828
      Metric Samples: Last Run: 621, Total: 19,139,488
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 123,312
      Average Execution Time : 427ms
      Last Execution Date : 2022-01-12 09:23:50 UTC (1641979430000)
      Last Successful Execution Date : 2022-01-12 09:23:50 UTC (1641979430000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 41,104
      Metric Samples: Last Run: 6, Total: 246,624
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:23:59 UTC (1641979439000)
      Last Successful Execution Date : 2022-01-12 09:23:59 UTC (1641979439000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 18, Total: 739,854
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:23:51 UTC (1641979431000)
      Last Successful Execution Date : 2022-01-12 09:23:51 UTC (1641979431000)

    network (2.4.0)
    ---------------
      Instance ID: network:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 41,104
      Metric Samples: Last Run: 73, Total: 3,001,192
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 3ms
      Last Execution Date : 2022-01-12 09:23:58 UTC (1641979438000)
      Last Successful Execution Date : 2022-01-12 09:23:58 UTC (1641979438000)

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 686
      Metric Samples: Last Run: 1, Total: 686
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 686
      Average Execution Time : 1.664s
      Last Execution Date : 2022-01-12 09:23:15 UTC (1641979395000)
      Last Successful Execution Date : 2022-01-12 09:23:15 UTC (1641979395000)

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 41,103
      Metric Samples: Last Run: 1, Total: 41,103
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-12 09:23:50 UTC (1641979430000)
      Last Successful Execution Date : 2022-01-12 09:23:50 UTC (1641979430000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 86676
    Successes By Endpoint:
      check_run_v1: 41,103
      intake: 4,470
      series_v1: 41,103

  API Keys status
  ===============
    API key ending with 1a34d: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 1a34d

==========
Logs Agent
==========

  Logs Agent is not running

=========
APM Agent
=========
  Status: Running
  Pid: 1
  Uptime: 616559 seconds
  Mem alloc: 11,340,800 bytes
  Hostname: my-k8s-695-worker2-my-k8s-695
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 79,794,111
  Dogstatsd Metric Sample: 5,014,622
  Event: 96
  Events Flushed: 96
  Number Of Flushes: 41,103
  Series Flushed: 66,061,099
  Service Check: 647,279
  Service Checks Flushed: 688,368
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 5,014,621
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 382,506,697
  Udp Packet Reading Errors: 0
  Udp Packets: 3,650,376
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 1
  Unterminated Metric Errors: 0

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://10.96.246.112:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.16.0+commit.9961689

=============
Autodiscovery
=============
  Enabled Features
  ================
    containerd
    cri
    kubernetes

Describe what happened: The metric of kubernetes.memory.limits is incorrect when the K8s cluster is in the KIND environment. The kubernetes.memory.limits of pod nginx-prepared-b579c489c-bkmfw in Datadog Dashboard is 40MiB. 截圖 2022-01-12 下午5 29 17

Describe what you expected: The memory limit of the pod nginx-prepared-b579c489c-bkmfw is 20MiB

root@KIND:~# kubectl describe pod -n nginx-preloader-sample nginx-prepared-b579c489c-bkmfw
Name:         nginx-prepared-b579c489c-bkmfw
Namespace:    nginx-preloader-sample
Priority:     0
Node:         my-k8s-695-worker2/172.18.0.4
Start Time:   Wed, 05 Jan 2022 06:22:22 +0000
Labels:       app=nginx-prepared
              pod-template-hash=b579c489c
Annotations:  <none>
Status:       Running
IP:           10.244.2.16
IPs:
  IP:           10.244.2.16
Controlled By:  ReplicaSet/nginx-prepared-b579c489c
Containers:
  nginx-prepared:
    Container ID:   containerd://9ffd7a8d35e0adb265456e83b3ff9170c479ebcc5635b5768dca208a4e555f83
    Image:          nginx:1.7.9
    Image ID:       sha256:35d28df486f6150fa3174367499d1eb01f22f5a410afe4b9581ac0e0e58b3eaf
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 05 Jan 2022 06:22:58 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  20Mi
    Requests:
      cpu:        100m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from nginx-prepared-token-fl5dr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  nginx-prepared-token-fl5dr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-prepared-token-fl5dr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Steps to reproduce the issue: Step 1. Prepare Kubernetes IN Docker environment. Step 2. Apply a deployment with CPU Limit/Request, Memory Limit/Request Step 3. Check the kubernetes.memory.limits metrics in Datadog Dashboard

Additional environment details (Operating System, Cloud provider, etc): root@KIND:~# kind --version kind version 0.11.1

root@KIND:~# kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-09-14T07:44:34Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

aliciascott commented 2 years ago

I tested this on a KIND Cluster running version

kind version 0.12.0

It appears in your screen shots you'll want to scope out on your Metrics/ Notebook by pod_name, otherwise we show the avg of the memory limits for all pods by your deployment name "nginx-deployment" or in this case since you are scoping by entire Infrastructure, so here is an example in a Notebook how it would look:

Alicia Apr 04 2022 1433 Datadog 2022-04-04 at 2 37 52 PM

kubectl describe deployment nginx-deployment
Name:                   nginx-deployment
Namespace:              kube-system
CreationTimestamp:      Mon, 04 Apr 2022 14:27:36 -0600
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=nginx
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:      nginx:1.14.2
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     500m
      memory:  20Mi
    Requests:
      cpu:        250m
      memory:     20Mi

Or, if you want to graph this out via the Metrics explorer; instead of graph over entire Infrastructure, again you would want to graph by pod_name which should then show the accurate representation of the memory limits you set respective to the nginx pod:

Metric Explorer Datadog 2022-04-04 at 2 44 05 PM

Screen Shot 2022-04-04 at 2 45 01 PM
yummydsky commented 2 years ago

I have already chosen the specific pod name, which means I don't need to use avg by pod_name. And I'm sure that I use the average aggregation method for this 149104943-1eb0c526-01be-4a8f-9deb-7e802f1136f2 .

yummydsky commented 2 years ago

@aliciascott I just tried the suggestion you provided, but the memory limit size is still incorrect.

The Datadog agent information

root@robotframework:~# kubectl exec -it -n datadog-monitoring datadog-agent-2w7tr -- agent status
Defaulting container name to agent.
Use 'kubectl describe pod/datadog-agent-2w7tr -n datadog-monitoring' to see all of the containers in this pod.
2022-04-06 01:37:58 UTC | CORE | WARN | (pkg/util/log/log.go:592 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
2022-04-06 01:37:58 UTC | CORE | INFO | (cmd/system-probe/config/config.go:118 in Merge) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
Getting the status from the agent.

===============
Agent (v7.34.0)
===============

  Status date: 2022-04-06 01:37:58.045 UTC (1649209078045)
  Agent start: 2022-04-05 22:05:44.954 UTC (1649196344954)
  Pid: 1
  Go Version: go1.16.12
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -4.263ms
    System time: 2022-04-06 01:37:58.045 UTC (1649209078045)

  Host Info
  =========
    bootTime: 2022-04-01 06:07:05 UTC (1648793225000)
    kernelArch: x86_64
    kernelVersion: 5.8.0-36-generic
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.10
    procs: 157
    uptime: 111h59m2s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    cluster-name: k8s-817-491d5ef2
    host_aliases: [k8s-817-491d5ef2-worker2-k8s-817-491d5ef2]
    hostname: k8s-817-491d5ef2-worker2-k8s-817-491d5ef2
    socket-fqdn: datadog-agent-2w7tr
    socket-hostname: datadog-agent-2w7tr
    host tags:
      cluster_name:k8s-817-491d5ef2
      kube_cluster:k8s-817-491d5ef2
      kube_cluster_name:k8s-817-491d5ef2
    hostname provider: container
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

  Metadata
  ========
    agent_version: 7.34.0
    config_apm_dd_url: 
    config_dd_url: 
    config_logs_dd_url: 
    config_logs_socks5_proxy_address: 
    config_no_proxy: []
    config_process_dd_url: 
    config_proxy_http: 
    config_proxy_https: 
    config_site: 
    feature_apm_enabled: false
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_logs_enabled: false
    feature_networks_enabled: false
    feature_process_enabled: false
    flavor: agent
    hostname_source: container
    install_method_installer_version: datadog-2.30.20
    install_method_tool: helm
    install_method_tool_version: Helm

=========
Collector
=========

  Running Checks
  ==============

    container
    ---------
      Instance ID: container [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/container.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 777, Total: 653,037
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 77ms
      Last Execution Date : 2022-04-06 01:37:51 UTC (1649209071000)
      Last Successful Execution Date : 2022-04-06 01:37:51 UTC (1649209071000)

    containerd
    ----------
      Instance ID: containerd [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/containerd.d/conf.yaml.default
      Total Runs: 846
      Metric Samples: Last Run: 2,016, Total: 1,690,308
      Events: Last Run: 3, Total: 3
      Service Checks: Last Run: 1, Total: 846
      Average Execution Time : 2.298s
      Last Execution Date : 2022-04-06 01:37:44 UTC (1649209064000)
      Last Successful Execution Date : 2022-04-06 01:37:44 UTC (1649209064000)

    coredns (1.11.4)
    ----------------
      Instance ID: coredns:863804e40421b505 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/coredns.d/auto_conf.yaml
      Total Runs: 846
      Metric Samples: Last Run: 154, Total: 130,284
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 846
      Average Execution Time : 199ms
      Last Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)
      Last Successful Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)

      Instance ID: coredns:a324974808b4159f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/coredns.d/auto_conf.yaml
      Total Runs: 847
      Metric Samples: Last Run: 176, Total: 149,072
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 847
      Average Execution Time : 203ms
      Last Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)
      Last Successful Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 9, Total: 7,616
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms
      Last Execution Date : 2022-04-06 01:37:49 UTC (1649209069000)
      Last Successful Execution Date : 2022-04-06 01:37:49 UTC (1649209069000)

    cri
    ---
      Instance ID: cri [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cri.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 90, Total: 76,230
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 194ms
      Last Execution Date : 2022-04-06 01:37:57 UTC (1649209077000)
      Last Successful Execution Date : 2022-04-06 01:37:57 UTC (1649209077000)

    datadog_cluster_agent (1.3.3)
    -----------------------------
      Instance ID: datadog_cluster_agent:e8e3aabdbd9341d4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/datadog_cluster_agent.d/auto_conf.yaml
      Total Runs: 846
      Metric Samples: Last Run: 16, Total: 13,520
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 846
      Average Execution Time : 50ms
      Last Execution Date : 2022-04-06 01:37:51 UTC (1649209071000)
      Last Successful Execution Date : 2022-04-06 01:37:51 UTC (1649209071000)

    disk (4.5.2)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 1,044, Total: 884,308
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 332ms
      Last Execution Date : 2022-04-06 01:37:49 UTC (1649209069000)
      Last Successful Execution Date : 2022-04-06 01:37:49 UTC (1649209069000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 5, Total: 4,235
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-06 01:37:55 UTC (1649209075000)
      Last Successful Execution Date : 2022-04-06 01:37:55 UTC (1649209075000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 80, Total: 67,706
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 5ms
      Last Execution Date : 2022-04-06 01:37:47 UTC (1649209067000)
      Last Successful Execution Date : 2022-04-06 01:37:47 UTC (1649209067000)

    kafka_consumer (2.12.3)
    -----------------------
      Instance ID: kafka_consumer:49da92eb4b468986 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.yaml
      Total Runs: 815
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2.842s
      Last Execution Date : 2022-04-06 01:37:44 UTC (1649209064000)
      Last Successful Execution Date : Never
      Error: NoBrokersAvailable
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1008, in run
          initialization()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 87, in _init_check_based_on_kafka_version
          self.sub_check = self._make_sub_check()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 115, in _make_sub_check
          kafka_client = self.create_kafka_client()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 62, in create_kafka_client
          return self._create_kafka_client(clazz=KafkaClient)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 139, in _create_kafka_client
          return clazz(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/kafka/client_async.py", line 244, in __init__
          self.config['api_version'] = self.check_version(timeout=check_timeout)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/kafka/client_async.py", line 900, in check_version
          raise Errors.NoBrokersAvailable()
      kafka.errors.NoBrokersAvailable: NoBrokersAvailable

    kubelet (7.1.1)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 635
      Metric Samples: Last Run: 1,118, Total: 709,889
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 2,540
      Average Execution Time : 2.375s
      Last Execution Date : 2022-04-06 01:37:39 UTC (1649209059000)
      Last Successful Execution Date : 2022-04-06 01:37:39 UTC (1649209059000)

    kubernetes_state (6.0.1)
    ------------------------
      Instance ID: kubernetes_state:2031391f3a32fd3e [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state.d/auto_conf.yaml
      Total Runs: 846
      Metric Samples: Last Run: 1,586, Total: 1,340,170
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 9, Total: 7,605
      Average Execution Time : 808ms
      Last Execution Date : 2022-04-06 01:37:44 UTC (1649209064000)
      Last Successful Execution Date : 2022-04-06 01:37:44 UTC (1649209064000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 6, Total: 5,082
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms
      Last Execution Date : 2022-04-06 01:37:54 UTC (1649209074000)
      Last Successful Execution Date : 2022-04-06 01:37:54 UTC (1649209074000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 846
      Metric Samples: Last Run: 20, Total: 16,920
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2ms
      Last Execution Date : 2022-04-06 01:37:46 UTC (1649209066000)
      Last Successful Execution Date : 2022-04-06 01:37:46 UTC (1649209066000)

    network (2.5.0)
    ---------------
      Instance ID: network:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 847
      Metric Samples: Last Run: 257, Total: 217,679
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 35ms
      Last Execution Date : 2022-04-06 01:37:54 UTC (1649209074000)
      Last Successful Execution Date : 2022-04-06 01:37:54 UTC (1649209074000)

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 1, Total: 15
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 15
      Average Execution Time : 691ms
      Last Execution Date : 2022-04-06 01:36:17 UTC (1649208977000)
      Last Successful Execution Date : 2022-04-06 01:36:17 UTC (1649208977000)

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 846
      Metric Samples: Last Run: 1, Total: 846
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)
      Last Successful Execution Date : 2022-04-06 01:37:45 UTC (1649209065000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 2634
    Successes By Endpoint:
      check_run_v1: 848
      intake: 73
      metadata_v1: 21
      series_v1: 1,692

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with 1a34d: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 1a34d

==========
Logs Agent
==========

  Logs Agent is not running

=========
APM Agent
=========
  Status: Running
  Pid: 1
  Uptime: 381686 seconds
  Mem alloc: 9,519,864 bytes
  Hostname: k8s-817-491d5ef2-worker2-k8s-817-491d5ef2
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 5,995,403
  Dogstatsd Metric Sample: 103,504
  Event: 4
  Events Flushed: 4
  Number Of Flushes: 848
  Series Flushed: 5,212,065
  Service Check: 27,714
  Service Checks Flushed: 28,554
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 103,503
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 7,852,262
  Udp Packet Reading Errors: 0
  Udp Packets: 70,050
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 1
  Unterminated Metric Errors: 0

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://10.96.186.157:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.18.0+commit.78e3126

=============
Autodiscovery
=============
  Enabled Features
  ================
    containerd
    cri
    kubernetes

The memory of pod nginx-prepared-b579c489c-rd7zp should be 20 MiB

root@robotframework:~# kubectl describe pod -n nginx-preloader-sample nginx-prepared-b579c489c-rd7zp
Name:         nginx-prepared-b579c489c-rd7zp
Namespace:    nginx-preloader-sample
Priority:     0
Node:         k8s-817-491d5ef2-worker2/172.18.0.4
Start Time:   Sun, 03 Apr 2022 12:01:56 +0000
Labels:       app=nginx-prepared
              pod-template-hash=b579c489c
Annotations:  <none>
Status:       Running
IP:           10.244.1.53
IPs:
  IP:           10.244.1.53
Controlled By:  ReplicaSet/nginx-prepared-b579c489c
Containers:
  nginx-prepared:
    Container ID:   containerd://c7867580afe554eb538dd9d1210a9f52ccd952aa4e84f8fd01e079f220839688
    Image:          nginx:1.7.9
    Image ID:       sha256:35d28df486f6150fa3174367499d1eb01f22f5a410afe4b9581ac0e0e58b3eaf
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 03 Apr 2022 12:01:58 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  20Mi
    Requests:
      cpu:        100m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from nginx-prepared-token-dgh7w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  nginx-prepared-token-dgh7w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-prepared-token-dgh7w
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

But the value in Datadog dashboard is 40MiB 截圖 2022-04-06 上午9 33 49 截圖 2022-04-06 上午9 46 17 截圖 2022-04-06 上午9 46 26

yummydsky commented 2 years ago

@aliciascott Can you help to provide the k8s version you use which runs in KIND 0.12.0?

aliciascott commented 2 years ago

Please try the following:

KIND 0.12.0
Kubernetes v1.22.3
Datadog Agent v7.34.0
aliciascott commented 2 years ago

We are continuing our investigation of this via the Support ticket raised internally.