Closed gerrit8143 closed 2 years ago
Hi @gerrit8143
Could you provide the values.yaml
or the option used to deploy the agent, the chart version and the container runtime that you are using in the cluster 🙇
For what I see from the pod list, it is an issue to create the init-containers.
Hi @clamoriniere currently we use the latest datadog helm chart 2.27.7 but we had the same issue with 2.26.3, container runtime is docker://20.10.11
## Default values for Datadog Agent
## See Datadog helm documentation to learn more:
## https://docs.datadoghq.com/agent/kubernetes/helm/
# nameOverride -- Override name of app
nameOverride: # ""
# fullnameOverride -- Override the full qualified app name
fullnameOverride: # ""
# targetSystem -- Target OS for this deployment (possible values: linux, windows)
targetSystem: "linux"
# registry -- Registry to use for all Agent images (default gcr.io)
## Currently we offer Datadog Agent images on:
## GCR - use gcr.io/datadoghq (default)
## DockerHub - use docker.io/datadog
## AWS - use public.ecr.aws/datadog
registry: docker.repo.dreamit.de/datadoghq
datadog:
# datadog.apiKey -- Your Datadog API key
# ref: https://app.datadoghq.com/account/settings#agent/kubernetes
apiKey: <apikey>
# datadog.apiKeyExistingSecret -- Use existing Secret which stores API key instead of creating a new one
## If set, this parameter takes precedence over "apiKey".
apiKeyExistingSecret: # <DATADOG_API_KEY_SECRET>
# datadog.appKey -- Datadog APP key required to use metricsProvider
## If you are using clusterAgent.metricsProvider.enabled = true, you must set
## a Datadog application key for read access to your metrics.
appKey: # <DATADOG_APP_KEY>
# datadog.appKeyExistingSecret -- Use existing Secret which stores APP key instead of creating a new one
## If set, this parameter takes precedence over "appKey".
appKeyExistingSecret: # <DATADOG_APP_KEY_SECRET>
# datadog.securityContext -- Allows you to overwrite the default PodSecurityContext on the Daemonset or Deployment
securityContext: {}
# seLinuxOptions:
# user: "system_u"
# role: "system_r"
# type: "spc_t"
# level: "s0"
# datadog.hostVolumeMountPropagation -- Allow to specify the `mountPropagation` value on all volumeMounts using HostPath
## ref: https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation
hostVolumeMountPropagation: None
# datadog.clusterName -- Set a unique cluster name to allow scoping hosts and Cluster Checks easily
## The name must be unique and must be dot-separated tokens with the following restrictions:
## * Lowercase letters, numbers, and hyphens only.
## * Must start with a letter.
## * Must end with a number or a letter.
## * Overall length should not be higher than 80 characters.
## Compared to the rules of GKE, dots are allowed whereas they are not allowed on GKE:
## https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster.FIELDS.name
clusterName: karma.eu-west-1.llcloud.io
# datadog.site -- The site of the Datadog intake to send Agent data to
## Set to 'datadoghq.eu' to send data to the EU site.
site: datadoghq.eu
# datadog.dd_url -- The host of the Datadog intake server to send Agent data to, only set this option if you need the Agent to send data to a custom URL
## Overrides the site setting defined in "site".
dd_url: # https://app.datadoghq.eu
# datadog.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, off
logLevel: INFO
# datadog.kubeStateMetricsEnabled -- If true, deploys the kube-state-metrics deployment
## ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics
kubeStateMetricsEnabled: true
kubeStateMetricsNetworkPolicy:
# datadog.kubeStateMetricsNetworkPolicy.create -- If true, create a NetworkPolicy for kube state metrics
create: false
## Manage Cluster checks feature
## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/
## Autodiscovery via Kube Service annotations is automatically enabled
clusterChecks:
# datadog.clusterChecks.enabled -- Enable the Cluster Checks feature on both the cluster-agents and the daemonset
enabled: false
# datadog.nodeLabelsAsTags -- Provide a mapping of Kubernetes Node Labels to Datadog Tags
nodeLabelsAsTags:
# beta.kubernetes.io/instance-type: aws-instance-type
# kubernetes.io/role: kube_role
# <KUBERNETES_NODE_LABEL>: <DATADOG_TAG_KEY>
kubernetes.io/hostname: kube_node_name
# datadog.podLabelsAsTags -- Provide a mapping of Kubernetes Labels to Datadog Tags
podLabelsAsTags: {}
# app: kube_app
# release: helm_release
# <KUBERNETES_LABEL>: <DATADOG_TAG_KEY>
# datadog.podAnnotationsAsTags -- Provide a mapping of Kubernetes Annotations to Datadog Tags
podAnnotationsAsTags: {}
# iam.amazonaws.com/role: kube_iamrole
# <KUBERNETES_ANNOTATIONS>: <DATADOG_TAG_KEY>
# datadog.tags -- List of static tags to attach to every metric, event and service check collected by this Agent.
## Learn more about tagging: https://docs.datadoghq.com/tagging/
tags: []
# - "<KEY_1>:<VALUE_1>"
# - "<KEY_2>:<VALUE_2>"
# kubelet configuration
kubelet:
# datadog.kubelet.host -- Override kubelet IP
host:
valueFrom:
fieldRef:
fieldPath: status.hostIP
# datadog.kubelet.tlsVerify -- Toggle kubelet TLS verification
tlsVerify: false
## dogstatsd configuration
## ref: https://docs.datadoghq.com/agent/kubernetes/dogstatsd/
## To emit custom metrics from your Kubernetes application, use DogStatsD.
dogstatsd:
# datadog.dogstatsd.port -- Override the Agent DogStatsD port
## Note: Make sure your client is sending to the same UDP port.
port: 8125
# datadog.dogstatsd.originDetection -- Enable origin detection for container tagging
## https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-origin-detection-for-container-tagging
originDetection: true
# datadog.dogstatsd.tags -- List of static tags to attach to every custom metric, event and service check collected by Dogstatsd.
## Learn more about tagging: https://docs.datadoghq.com/tagging/
tags: []
# - "<KEY_1>:<VALUE_1>"
# - "<KEY_2>:<VALUE_2>"
# datadog.dogstatsd.tagCardinality -- Sets the tag cardinality relative to the origin detection
## https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-origin-detection-for-container-tagging
tagCardinality: low
# datadog.dogstatsd.useSocketVolume -- Enable dogstatsd over Unix Domain Socket
## ref: https://docs.datadoghq.com/developers/dogstatsd/unix_socket/
useSocketVolume: false
# datadog.dogstatsd.socketPath -- Path to the DogStatsD socket
socketPath: /var/run/datadog/dsd.socket
# datadog.dogstatsd.hostSocketPath -- Host path to the DogStatsD socket
hostSocketPath: /var/run/datadog/
# datadog.dogstatsd.useHostPort -- Sets the hostPort to the same value of the container port
## Needs to be used for sending custom metrics.
## The ports need to be available on all hosts.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
useHostPort: true
# datadog.dogstatsd.useHostPID -- Run the agent in the host's PID namespace
## This is required for Dogstatsd origin detection to work.
## See https://docs.datadoghq.com/developers/dogstatsd/unix_socket/
useHostPID: true
# datadog.dogstatsd.nonLocalTraffic -- Enable this to make each node accept non-local statsd traffic (from outside of the pod)
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
nonLocalTraffic: true
# datadog.collectEvents -- Enables this to start event collection from the kubernetes API
## ref: https://docs.datadoghq.com/agent/kubernetes/#event-collection
collectEvents: true
# datadog.leaderElection -- Enables leader election mechanism for event collection
leaderElection: true
# datadog.leaderLeaseDuration -- Set the lease time for leader election in second
leaderLeaseDuration: # 60
## Enable logs agent and provide custom configs
logs:
# datadog.logs.enabled -- Enables this to activate Datadog Agent log collection
## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
enabled: true
# datadog.logs.containerCollectAll -- Enable this to allow log collection for all containers
## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
containerCollectAll: true
# datadog.logs.containerCollectUsingFiles -- Collect logs from files in /var/log/pods instead of using container runtime API
## It's usually the most efficient way of collecting logs.
## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
containerCollectUsingFiles: true
## Enable apm agent and provide custom configs
apm:
# datadog.apm.enabled -- Enable this to enable APM and tracing, on port 8126
## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host
enabled: false
# datadog.apm.port -- Override the trace Agent port
## Note: Make sure your client is sending to the same UDP port.
port: 8126
# datadog.apm.useSocketVolume -- Enable APM over Unix Domain Socket
## ref: https://docs.datadoghq.com/agent/kubernetes/apm/
useSocketVolume: false
# datadog.apm.socketPath -- Path to the trace-agent socket
socketPath: /var/run/datadog/apm.socket
# datadog.apm.hostSocketPath -- Host path to the trace-agent socket
hostSocketPath: /var/run/datadog/
# datadog.envFrom -- Set environment variables for all Agents directly from configMaps and/or secrets
## envFrom to pass configmaps or secrets as environment
envFrom: []
# - configMapRef:
# name: <CONFIGMAP_NAME>
# - secretRef:
# name: <SECRET_NAME>
# datadog.env -- Set environment variables for all Agents
## The Datadog Agent supports many environment variables.
## ref: https://docs.datadoghq.com/agent/docker/?tab=standard#environment-variables
env:
- name: DD_CONTAINER_EXCLUDE
value: "kube_namespace:monitoring"
- name: DD_AC_EXCLUDE
value: "kube_namespace:datadog kube_namespace:velero kube_namespace:cost-analyzer"
- name: DD_CHECKS_TAG_CARDINALITY
value: 'orchestrator'
- name: DD_KUBERNETES_POD_LABELS_AS_TAGS
value: '{"*":"kube_pod_label_%%label%%"}'
- name: DD_ENV
value: 'test'
- name: DD_TAGS
value: 'kubernetescluster:karma.eu-west-1.llcloud.io'
# datadog.confd -- Provide additional check configurations (static and Autodiscovery)
## Each key becomes a file in /conf.d
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes
## ref: https://docs.datadoghq.com/agent/autodiscovery/
confd:
# redisdb.yaml: |-
# init_config:
# instances:
# - host: "name"
# port: "6379"
kube_apiserver_metrics.yaml: |-
ad_identifiers:
- kube-apiserver
init_config:
instances:
- prometheus_url: "https://%%host%%:443/metrics"
bearer_token_auth: true
tags:
- "apiserver:%%host%%"
# datadog.checksd -- Provide additional custom checks as python code
## Each key becomes a file in /checks.d
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes
checksd: {}
# service.py: |-
# datadog.dockerSocketPath -- Path to the docker socket
dockerSocketPath: # /var/run/docker.sock
# datadog.criSocketPath -- Path to the container runtime socket (if different from Docker)
criSocketPath: # /var/run/containerd/containerd.sock
## Enable process agent and provide custom configs
processAgent:
# datadog.processAgent.enabled -- Set this to true to enable live process monitoring agent
## Note: /etc/passwd is automatically mounted to allow username resolution.
## ref: https://docs.datadoghq.com/graphing/infrastructure/process/#kubernetes-daemonset
enabled: false
# datadog.processAgent.processCollection -- Set this to true to enable process collection in process monitoring agent
## Requires processAgent.enabled to be set to true to have any effect
processCollection: true
## Enable systemProbe agent and provide custom configs
systemProbe:
# datadog.systemProbe.debugPort -- Specify the port to expose pprof and expvar for system-probe agent
debugPort: 0
# datadog.systemProbe.enableConntrack -- Enable the system-probe agent to connect to the netlink/conntrack subsystem to add NAT information to connection data
## Ref: http://conntrack-tools.netfilter.org/
enableConntrack: false
# datadog.systemProbe.seccomp -- Apply an ad-hoc seccomp profile to the system-probe agent to restrict its privileges
## Note that this will break `kubectl exec … -c system-probe -- /bin/bash`
seccomp: localhost/system-probe
# datadog.systemProbe.seccompRoot -- Specify the seccomp profile root directory
seccompRoot: /var/lib/kubelet/seccomp
# datadog.systemProbe.bpfDebug -- Enable logging for kernel debug
bpfDebug: false
# datadog.systemProbe.apparmor -- Specify a apparmor profile for system-probe
apparmor: unconfined
# datadog.systemProbe.enableTCPQueueLength -- Enable the TCP queue length eBPF-based check
enableTCPQueueLength: false
# datadog.systemProbe.enableOOMKill -- Enable the OOM kill eBPF-based check
enableOOMKill: false
# datadog.systemProbe.collectDNSStats -- Enable DNS stat collection
collectDNSStats: true
orchestratorExplorer:
# datadog.orchestratorExplorer.enabled -- Set this to false to disable the orchestrator explorer
## This requires processAgent.enabled and clusterAgent.enabled to be set to true
## ref: TODO - add doc link
enabled: true
# datadog.orchestratorExplorer.container_scrubbing -- Enable the scrubbing of containers in the kubernetes resource YAML for sensitive information
## The container scrubbing is taking significant resources during data collection.
## If you notice that the cluster-agent uses too much CPU in larger clusters
## turning this option off will improve the situation.
container_scrubbing:
enabled: true
networkMonitoring:
# datadog.networkMonitoring.enabled -- Enable network performance monitoring
enabled: false
## Enable security agent and provide custom configs
securityAgent:
compliance:
# datadog.securityAgent.compliance.enabled -- Set this to true to enable compliance checks
enabled: false
# datadog.securityAgent.compliance.configMap -- Contains compliance benchmarks that will be used
configMap:
# datadog.securityAgent.compliance.checkInterval -- Compliance check run interval
checkInterval: 20m
runtime:
# datadog.securityAgent.runtime.enabled -- Set to true to enable the Security Runtime Module
enabled: false
policies:
# datadog.securityAgent.runtime.policies.configMap -- Contains policies that will be used
configMap:
syscallMonitor:
# datadog.securityAgent.runtime.syscallMonitor.enabled -- Set to true to enable the Syscall monitoring.
enabled: false
## Manage NetworkPolicy
networkPolicy:
# datadog.networkPolicy.create -- If true, create NetworkPolicy for all the components
create: false
# datadog.networkPolicy.flavor -- Flavor of the network policy to use.
# Can be:
# * kubernetes for networking.k8s.io/v1/NetworkPolicy
# * cilium for cilium.io/v2/CiliumNetworkPolicy
flavor: kubernetes
cilium:
# datadog.networkPolicy.cilium.dnsSelector -- Cilium selector of the DNS server entity
# @default -- kube-dns in namespace kube-system
dnsSelector:
toEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": kube-system
"k8s:k8s-app": kube-dns
## This is the Datadog Cluster Agent implementation that handles cluster-wide
## metrics more cleanly, separates concerns for better rbac, and implements
## the external metrics API so you can autoscale HPAs based on datadog metrics
## ref: https://docs.datadoghq.com/agent/kubernetes/cluster/
clusterAgent:
# clusterAgent.enabled -- Set this to false to disable Datadog Cluster Agent
enabled: true
## Define the Datadog Cluster-Agent image to work with
image:
# clusterAgent.image.name -- Cluster Agent image name to use (relative to `registry`)
name: cluster-agent
# clusterAgent.image.tag -- Cluster Agent image tag to use
# tag: 1.11.0
# clusterAgent.image.repository -- Override default registry + image.name for Cluster Agent
repository:
# clusterAgent.image.pullPolicy -- Cluster Agent image pullPolicy
pullPolicy: IfNotPresent
# clusterAgent.image.pullSecrets -- Cluster Agent repository pullSecret (ex: specify docker registry credentials)
## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
pullSecrets: []
# - name: "<REG_SECRET>"
# clusterAgent.securityContext -- Allows you to overwrite the default PodSecurityContext on the cluster-agent pods.
securityContext: {}
# clusterAgent.command -- Command to run in the Cluster Agent container as entrypoint
command: []
# clusterAgent.token -- Cluster Agent token is a preshared key between node agents and cluster agent (autogenerated if empty, needs to be at least 32 characters a-zA-z)
token: ""
# clusterAgent.tokenExistingSecret -- Existing secret name to use for Cluster Agent token
tokenExistingSecret: ""
# clusterAgent.replicas -- Specify the of cluster agent replicas, if > 1 it allow the cluster agent to work in HA mode.
replicas: 1
## Provide Cluster Agent Deployment pod(s) RBAC configuration
rbac:
# clusterAgent.rbac.create -- If true, create & use RBAC resources
create: true
# clusterAgent.rbac.serviceAccountName -- Specify service account name to use (usually pre-existing, created if create is true)
serviceAccountName: datadog
## Provide Cluster Agent PodSecurityPolicy configuration
podSecurity:
podSecurityPolicy:
# clusterAgent.podSecurity.podSecurityPolicy.create -- If true, create a PodSecurityPolicy resource for Cluster Agent pods
create: true
# Enable the metricsProvider to be able to scale based on metrics in Datadog
metricsProvider:
# clusterAgent.metricsProvider.enabled -- Set this to true to enable Metrics Provider
enabled: true
# clusterAgent.metricsProvider.wpaController -- Enable informer and controller of the watermark pod autoscaler
## NOTE: You need to install the `WatermarkPodAutoscaler` CRD before
wpaController: false
# clusterAgent.metricsProvider.useDatadogMetrics -- Enable usage of DatadogMetric CRD to autoscale on arbitrary Datadog queries
## NOTE: You need to install the `DatadogMetric` CRD before
useDatadogMetrics: true
# clusterAgent.metricsProvider.createReaderRbac -- Create `external-metrics-reader` RBAC automatically (to allow HPA to read data from Cluster Agent)
createReaderRbac: true
# clusterAgent.metricsProvider.aggregator -- Define the aggregator the cluster agent will use to process the metrics. The options are (avg, min, max, sum)
aggregator: avg
## Configuration for the service for the cluster-agent metrics server
service:
# clusterAgent.metricsProvider.service.type -- Set type of cluster-agent metrics server service
type: ClusterIP
# clusterAgent.metricsProvider.service.port -- Set port of cluster-agent metrics server service (Kubernetes >= 1.15)
port: 8443
# clusterAgent.env -- Set environment variables specific to Cluster Agent
## The Cluster-Agent supports many additional environment variables
## ref: https://docs.datadoghq.com/agent/cluster_agent/commands/#cluster-agent-options
env: []
admissionController:
# clusterAgent.admissionController.enabled -- Enable the admissionController to be able to inject APM/Dogstatsd config and standard tags (env, service, version) automatically into your pods
enabled: false
# clusterAgent.admissionController.mutateUnlabelled -- Enable injecting config without having the pod label 'admission.datadoghq.com/enabled="true"'
mutateUnlabelled: false
# clusterAgent.confd -- Provide additional cluster check configurations
## Each key will become a file in /conf.d
## ref: https://docs.datadoghq.com/agent/autodiscovery/
confd:
# mysql.yaml: |-
# cluster_check: true
# instances:
# - server: '<EXTERNAL_IP>'
# port: 3306
# user: datadog
# pass: '<YOUR_CHOSEN_PASSWORD>'
kubernetes_state.yaml: |-
ad_identifiers:
- kube-state-metrics
init_config:
instances:
- kube_state_url: http://%%host%%:8080/metrics
join_standard_tags: true
kube_apiserver_metrics.yaml: |-
ad_identifiers:
- kube-apiserver
init_config:
instances:
- prometheus_url: "https://%%host%%:443/metrics"
bearer_token_auth: true
tags:
- "apiserver:%%host%%"
# clusterAgent.resources -- Datadog cluster-agent resource requests and limits.
resources: {}
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
# clusterAgent.priorityClassName -- Name of the priorityClass to apply to the Cluster Agent
priorityClassName: system-cluster-critical
# clusterAgent.nodeSelector -- Allow the Cluster Agent Deployment to be scheduled on selected nodes
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
# clusterAgent.affinity -- Allow the Cluster Agent Deployment to schedule using affinity rules
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# clusterAgent.healthPort -- Port number to use in the Cluster Agent for the healthz endpoint
healthPort: 5555
# clusterAgent.livenessProbe -- Override default Cluster Agent liveness probe settings
# @default -- Every 15s / 6 KO / 1 OK
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# clusterAgent.readinessProbe -- Override default Cluster Agent readiness probe settings
# @default -- Every 15s / 6 KO / 1 OK
readinessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# clusterAgent.strategy -- Allow the Cluster Agent deployment to perform a rolling update on helm update
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
# clusterAgent.podAnnotations -- Annotations to add to the cluster-agents's pod(s)
podAnnotations: {}
# key: "value"
# clusterAgent.useHostNetwork -- Bind ports on the hostNetwork
## Useful for CNI networking where hostPort might
## not be supported. The ports need to be available on all hosts. It can be
## used for custom metrics instead of a service endpoint.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
#
useHostNetwork: # true
# clusterAgent.dnsConfig -- Specify dns configuration options for datadog cluster agent containers e.g ndots
## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config
dnsConfig: {}
# options:
# - name: ndots
# value: "1"
# clusterAgent.volumes -- Specify additional volumes to mount in the cluster-agent container
volumes: []
# - hostPath:
# path: <HOST_PATH>
# name: <VOLUME_NAME>
# clusterAgent.volumeMounts -- Specify additional volumes to mount in the cluster-agent container
volumeMounts: []
# - name: <VOLUME_NAME>
# mountPath: <CONTAINER_PATH>
# readOnly: true
# clusterAgent.datadog_cluster_yaml -- Specify custom contents for the datadog cluster agent config (datadog-cluster.yaml)
datadog_cluster_yaml: {}
# clusterAgent.createPodDisruptionBudget -- Create pod disruption budget for Cluster Agent deployments
createPodDisruptionBudget: false
networkPolicy:
# clusterAgent.networkPolicy.create -- If true, create a NetworkPolicy for the cluster agent.
# DEPRECATED. Use datadog.networkPolicy.create instead
create: false
# clusterAgent.additionalLabels -- Adds labels to the Cluster Agent deployment and pods
additionalLabels: {}
# key: "value"
agents:
# agents.enabled -- You should keep Datadog DaemonSet enabled!
## The exceptional case could be a situation when you need to run
## single Datadog pod per every namespace, but you do not need to
## re-create a DaemonSet for every non-default namespace install.
## Note: StatsD and DogStatsD work over UDP, so you may not
## get guaranteed delivery of the metrics in Datadog-per-namespace setup!
#
enabled: true
## Define the Datadog image to work with
image:
# agents.image.name -- Datadog Agent image name to use (relative to `registry`)
## use "dogstatsd" for Standalone Datadog Agent DogStatsD 7
name: agent
# agents.image.tag -- Define the Agent version to use
## Use 7-jmx to enable jmx fetch collection
tag: 7-jmx
# agents.image.repository -- Override default registry + image.name for Agent
repository:
# agents.image.doNotCheckTag -- Skip the version<>chart compatibility check
## By default, the version passed in agents.image.tag is checked
## for compatibility with the version of the chart.
## This boolean permits to completely skip this check.
## This is useful, for example, for custom tags that are not
## respecting semantic versioning
doNotCheckTag: # false
# agents.image.pullPolicy -- Datadog Agent image pull policy
pullPolicy: IfNotPresent
# agents.image.pullSecrets -- Datadog Agent repository pullSecret (ex: specify docker registry credentials)
## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
pullSecrets: []
# - name: "<REG_SECRET>"
## Provide Daemonset RBAC configuration
rbac:
# agents.rbac.create -- If true, create & use RBAC resources
create: true
# agents.rbac.serviceAccountName -- Specify service account name to use (usually pre-existing, created if create is true)
serviceAccountName: datadog
## Provide Daemonset PodSecurityPolicy configuration
podSecurity:
podSecurityPolicy:
# agents.podSecurity.podSecurityPolicy.create -- If true, create a PodSecurityPolicy resource for Agent pods
create: true
securityContextConstraints:
# agents.podSecurity.securityContextConstraints.create -- If true, create a SecurityContextConstraints resource for Agent pods
create: false
# agents.podSecurity.seLinuxContext -- Provide seLinuxContext configuration for PSP/SCC
# @default -- Must run as spc_t
seLinuxContext:
rule: MustRunAs
seLinuxOptions:
user: system_u
role: system_r
type: spc_t
level: s0
# agents.podSecurity.privileged -- If true, Allow to run privileged containers
privileged: true
# agents.podSecurity.capabilites -- Allowed capabilites
capabilites:
- SYS_ADMIN
- SYS_RESOURCE
- SYS_PTRACE
- NET_ADMIN
- NET_BROADCAST
- IPC_LOCK
- AUDIT_CONTROL
- AUDIT_READ
# agents.podSecurity.volumes -- Allowed volumes types
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- secret
# agents.podSecurity.seccompProfiles -- Allowed seccomp profiles
seccompProfiles:
- "runtime/default"
- "localhost/system-probe"
# agents.podSecurity.apparmorProfiles -- Allowed apparmor profiles
apparmorProfiles:
- "runtime/default"
- "unconfined"
apparmor:
# agents.podSecurity.apparmor.enabled -- If true, enable apparmor enforcement
## see: https://kubernetes.io/docs/tutorials/clusters/apparmor/
enabled: false
containers:
agent:
# agents.containers.agent.env -- Additional environment variables for the agent container
env: []
# agents.containers.agent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off
## If not set, fall back to the value of datadog.logLevel.
logLevel: # INFO
# agents.containers.agent.resources -- Resource requests and limits for the agent container.
resources: {}
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
# agents.containers.agent.healthPort -- Port number to use in the node agent for the healthz endpoint
healthPort: 5555
# agents.containers.agent.livenessProbe -- Override default agent liveness probe settings
# @default -- Every 15s / 6 KO / 1 OK
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# agents.containers.agent.readinessProbe -- Override default agent readiness probe settings
# @default -- Every 15s / 6 KO / 1 OK
readinessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# agents.containers.agent.securityContext -- Allows you to overwrite the default container SecurityContext for the agent container.
securityContext: {}
# agents.containers.agent.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
processAgent:
# agents.containers.processAgent.env -- Additional environment variables for the process-agent container
env: []
# agents.containers.processAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off
## If not set, fall back to the value of datadog.logLevel.
logLevel: # INFO
# agents.containers.processAgent.resources -- Resource requests and limits for the process-agent container
resources: {}
# requests:
# cpu: 100m
# memory: 200Mi
# limits:
# cpu: 100m
# memory: 200Mi
# agents.containers.processAgent.securityContext -- Allows you to overwrite the default container SecurityContext for the process-agent container.
securityContext: {}
# agents.containers.processAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
traceAgent:
# agents.containers.traceAgent.env -- Additional environment variables for the trace-agent container
env: []
# agents.containers.traceAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off
logLevel: # INFO
# agents.containers.traceAgent.resources -- Resource requests and limits for the trace-agent container
resources: {}
# requests:
# cpu: 100m
# memory: 200Mi
# limits:
# cpu: 100m
# memory: 200Mi
# agents.containers.traceAgent.livenessProbe -- Override default agent liveness probe settings
# @default -- Every 15s
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
# agents.containers.traceAgent.securityContext -- Allows you to overwrite the default container SecurityContext for the trace-agent container.
securityContext: {}
# agents.containers.traceAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
systemProbe:
# agents.containers.systemProbe.env -- Additional environment variables for the system-probe container
env: []
# agents.containers.systemProbe.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off.
## If not set, fall back to the value of datadog.logLevel.
logLevel: # INFO
# agents.containers.systemProbe.resources -- Resource requests and limits for the system-probe container
resources: {}
# requests:
# cpu: 100m
# memory: 200Mi
# limits:
# cpu: 100m
# memory: 200Mi
# agents.containers.systemProbe.securityContext -- Allows you to overwrite the default container SecurityContext for the system-probe container.
securityContext:
privileged: false
capabilities:
add: ["SYS_ADMIN", "SYS_RESOURCE", "SYS_PTRACE", "NET_ADMIN", "NET_BROADCAST", "IPC_LOCK"]
# agents.containers.systemProbe.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
securityAgent:
# agents.containers.securityAgent.env -- Additional environment variables for the security-agent container
env: []
# agents.containers.securityAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off
## If not set, fall back to the value of datadog.logLevel.
logLevel: # INFO
# agents.containers.securityAgent.resources -- Resource requests and limits for the security-agent container
resources: {}
# requests:
# cpu: 100m
# memory: 200Mi
# limits:
# cpu: 100m
# memory: 200Mi
# agents.containers.securityAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
initContainers:
# agents.containers.initContainers.resources -- Resource requests and limits for the init containers
resources: {}
# requests:
# cpu: 100m
# memory: 200Mi
# limits:
# cpu: 100m
# memory: 200Mi
# agents.volumes -- Specify additional volumes to mount in the dd-agent container
volumes:
# - hostPath:
# path: <HOST_PATH>
# name: <VOLUME_NAME>
- name: kube-apiserver-metrics
emptyDir: {}
# agents.volumeMounts -- Specify additional volumes to mount in all containers of the agent pod
volumeMounts:
# - name: <VOLUME_NAME>
# mountPath: <CONTAINER_PATH>
# readOnly: true
- name: kube-apiserver-metrics
mountPath: /etc/datadog-agent/conf.d/kube_apiserver_metrics.d
# agents.useHostNetwork -- Bind ports on the hostNetwork
## Useful for CNI networking where hostPort might
## not be supported. The ports need to be available on all hosts. It Can be
## used for custom metrics instead of a service endpoint.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
useHostNetwork: false
# agents.dnsConfig -- specify dns configuration options for datadog cluster agent containers e.g ndots
## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config
dnsConfig: {}
# options:
# - name: ndots
# value: "1"
# agents.podAnnotations -- Annotations to add to the DaemonSet's Pods
podAnnotations: {}
# <POD_ANNOTATION>: '[{"key": "<KEY>", "value": "<VALUE>"}]'
# agents.tolerations -- Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
# agents.nodeSelector -- Allow the DaemonSet to schedule on selected nodes
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
# agents.affinity -- Allow the DaemonSet to schedule using affinity rules
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# agents.updateStrategy -- Allow the DaemonSet to perform a rolling update on helm update
## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: "10%"
# agents.priorityClassName -- Sets PriorityClassName if defineds
priorityClassName:
# agents.podLabels -- Sets podLabels if defined
# Note: These labels are also used as label selectors so they are immutable.
podLabels: {}
# agents.additionalLabels -- Adds labels to the Agent daemonset and pods
additionalLabels: {}
# key: "value"
# agents.useConfigMap -- Configures a configmap to provide the agent configuration. Use this in combination with the `agents.customAgentConfig` parameter.
useConfigMap: # false
# agents.customAgentConfig -- Specify custom contents for the datadog agent config (datadog.yaml)
## ref: https://docs.datadoghq.com/agent/guide/agent-configuration-files/?tab=agentv6
## ref: https://github.com/DataDog/datadog-agent/blob/master/pkg/config/config_template.yaml
## Note the `agents.useConfigMap` needs to be set to `true` for this parameter to be taken into account.
customAgentConfig:
# Autodiscovery for Kubernetes
listeners:
- name: kubelet
config_providers:
- name: kubelet
polling: true
# needed to support legacy docker label config templates
- name: docker
polling: true
# Enable APM by setting the DD_APM_ENABLED envvar to true, or override this configuration
apm_config:
enabled: false
apm_non_local_traffic: true
#
# # Enable java cgroup handling. Only one of those options should be enabled,
# # depending on the agent version you are using along that chart.
#
# # agent version < 6.15
# # jmx_use_cgroup_memory_limit: true
#
# # agent version >= 6.15
# # jmx_use_container_support: true
networkPolicy:
# agents.networkPolicy.create -- If true, create a NetworkPolicy for the agents.
# DEPRECATED. Use datadog.networkPolicy.create instead
create: false
clusterChecksRunner:
# clusterChecksRunner.enabled -- If true, deploys agent dedicated for running the Cluster Checks instead of running in the Daemonset's agents.
## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/
enabled: false
## Define the Datadog image to work with.
image:
# clusterChecksRunner.image.name -- Datadog Agent image name to use (relative to `registry`)
name: agent
# clusterChecksRunner.image.tag -- Define the Agent version to use
## Use 7-jmx to enable jmx fetch collection
tag: 7.25.0
# clusterChecksRunner.image.repository -- Override default registry + image.name for Cluster Check Runners
repository:
# clusterChecksRunner.image.pullPolicy -- Datadog Agent image pull policy
pullPolicy: IfNotPresent
# clusterChecksRunner.image.pullSecrets -- Datadog Agent repository pullSecret (ex: specify docker registry credentials)
## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
pullSecrets: []
# - name: "<REG_SECRET>"
# clusterChecksRunner.createPodDisruptionBudget -- Create the pod disruption budget to apply to the cluster checks agents
createPodDisruptionBudget: true
# Provide Cluster Checks Deployment pods RBAC configuration
rbac:
# clusterChecksRunner.rbac.create -- If true, create & use RBAC resources
create: true
# clusterChecksRunner.rbac.dedicated -- If true, use a dedicated RBAC resource for the cluster checks agent(s)
dedicated: false
# clusterChecksRunner.rbac.serviceAccountAnnotations -- Annotations to add to the ServiceAccount if clusterChecksRunner.rbac.dedicated is true
serviceAccountAnnotations: {}
# clusterChecksRunner.rbac.serviceAccountName -- Specify service account name to use (usually pre-existing, created if create is true)
serviceAccountName: datadog
# clusterChecksRunner.replicas -- Number of Cluster Checks Runner instances
## If you want to deploy the clusterChecks agent in HA, keep at least clusterChecksRunner.replicas set to 2.
## And increase the clusterChecksRunner.replicas according to the number of Cluster Checks.
replicas: 2
# clusterChecksRunner.resources -- Datadog clusterchecks-agent resource requests and limits.
resources: {}
# requests:
# cpu: 200m
# memory: 500Mi
# limits:
# cpu: 200m
# memory: 500Mi
# clusterChecksRunner.affinity -- Allow the ClusterChecks Deployment to schedule using affinity rules.
## By default, ClusterChecks Deployment Pods are forced to run on different Nodes.
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# clusterChecksRunner.strategy -- Allow the ClusterChecks deployment to perform a rolling update on helm update
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
# clusterChecksRunner.dnsConfig -- specify dns configuration options for datadog cluster agent containers e.g ndots
## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config
dnsConfig: {}
# options:
# - name: ndots
# value: "1"
# clusterChecksRunner.nodeSelector -- Allow the ClusterChecks Deployment to schedule on selected nodes
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
#
nodeSelector: {}
# clusterChecksRunner.tolerations -- Tolerations for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
#
tolerations: []
# clusterChecksRunner.healthPort -- Port number to use in the Cluster Checks Runner for the healthz endpoint
healthPort: 5555
# clusterChecksRunner.livenessProbe -- Override default agent liveness probe settings
# @default -- Every 15s / 6 KO / 1 OK
## In case of issues with the probe, you can disable it with the
## following values, to allow easier investigating:
#
# livenessProbe:
# exec:
# command: ["/bin/true"]
#
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# clusterChecksRunner.readinessProbe -- Override default agent readiness probe settings
# @default -- Every 15s / 6 KO / 1 OK
## In case of issues with the probe, you can disable it with the
## following values, to allow easier investigating:
#
# readinessProbe:
# exec:
# command: ["/bin/true"]
#
readinessProbe:
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
# clusterChecksRunner.podAnnotations -- Annotations to add to the cluster-checks-runner's pod(s)
podAnnotations: {}
# key: "value"
# clusterChecksRunner.env -- Environment variables specific to Cluster Checks Runner
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#environment-variables
env: []
# - name: <ENV_VAR_NAME>
# value: <ENV_VAR_VALUE>
# clusterChecksRunner.volumes -- Specify additional volumes to mount in the cluster checks container
volumes: []
# - hostPath:
# path: <HOST_PATH>
# name: <VOLUME_NAME>
# clusterChecksRunner.volumeMounts -- Specify additional volumes to mount in the cluster checks container
volumeMounts: []
# - name: <VOLUME_NAME>
# mountPath: <CONTAINER_PATH>
# readOnly: true
networkPolicy:
# clusterChecksRunner.networkPolicy.create -- If true, create a NetworkPolicy for the cluster checks runners.
# DEPRECATED. Use datadog.networkPolicy.create instead
create: false
# clusterChecksRunner.additionalLabels -- Adds labels to the cluster checks runner deployment and pods
additionalLabels: {}
# key: "value"
# clusterChecksRunner.securityContext -- Allows you to overwrite the default PodSecurityContext on the clusterchecks pods.
securityContext: {}
# clusterChecksRunner.ports -- Allows to specify extra ports (hostPorts for instance) for this container
ports: []
datadog-crds:
crds:
# datadog-crds.crds.datadogMetrics -- Set to true to deploy the DatadogMetrics CRD
datadogMetrics: true
kube-state-metrics:
rbac:
# kube-state-metrics.rbac.create -- If true, create & use RBAC resources
create: true
serviceAccount:
# kube-state-metrics.serviceAccount.create -- If true, create ServiceAccount, require rbac kube-state-metrics.rbac.create true
create: true
# kube-state-metrics.serviceAccount.name -- The name of the ServiceAccount to use.
## If not set and create is true, a name is generated using the fullname template
name:
# kube-state-metrics.resources -- Resource requests and limits for the kube-state-metrics container.
resources: {}
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
# kube-state-metrics.nodeSelector -- Node selector for KSM. KSM only supports Linux.
nodeSelector:
kubernetes.io/os: linux
Hey @gerrit8143,
Apparently it's linked to some interaction with SELinux. Are you using Lokomotive or plain Kubernetes (kubeadm) on your nodes?
You may want to try to remove the seLinuxOptions
there:
# agents.podSecurity.seLinuxContext -- Provide seLinuxContext configuration for PSP/SCC
# @default -- Must run as spc_t
seLinuxContext:
rule: MustRunAs
seLinuxOptions:
user: system_u
role: system_r
type: spc_t
level: s0
And see how it goes.
it looks like this was a Flatcar Linux issue, Datadog works again with Flatcar 3033.2.0, i'm closing this issue. Thanks!
Describe what happened: after upgrading our nodes from Flatcar 2905.2.5 to 2983.2.0 datadog refuses to start:
Describe what you expected: datadog runs
Steps to reproduce the issue: use Flatcar 2983.2.0 (same with 2983.2.1)
Additional environment details (Operating System, Cloud provider, etc): Flatcar 2983.2.0 on Kubernetes v1.20.9