DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
347 stars 1.02k forks source link

Unable to install Datadog on Windows Containers(EKS 1.25) #1151

Open sujith1594 opened 1 year ago

sujith1594 commented 1 year ago

Describe what happened: While trying to install datadog helm chart on windows eks cluster getting below error.

Error: failed to create containerd task: failed to create shim task: hcs::CreateComputeSystem init-volume: The requested operation for attach namespace failed.

Describe what you expected: Datadog agents must be running successfully on windows nodes

Steps to reproduce the issue: Below is the values file I used for Windows

datadog:
  apiKey: <api_key>
  appKey: <app_key>
  clusterName: <cluster_name>
  kubeStateMetricsEnabled: false
  dogstatsd:
    useHostPort: true
  logs:
    enabled: true
  apm:
    portEnabled: true
clusterAgent: 
  image:
    doNotCheckTag: true
agents:
  image:
    doNotCheckTag: true       
  priorityClassName: system-node-critical
  tolerations:
  - effect: NoSchedule
    key: os
    operator: Equal
    value: windows
targetSystem: windows
existingClusterAgent:
  join: true
  serviceName: datadog-cluster-agent
  tokenSecretName: datadog-cluster-agent
datadog-crds:
  crds:
    datadogMetrics: false

Additional environment details (Operating System, Cloud provider, etc):

Cloud Provider : AWS Distribution: EKS Version: 1.25

clamoriniere commented 1 year ago

Hi @sujith1594

Could you provide the Agent's Daemonset and Pod manifest it will help to understand if the generated manifest is valid 🙇

sujith1594 commented 1 year ago

Hi @clamoriniere ,

Below is the daemonset and pod manifest

Daemonset

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    meta.helm.sh/release-name: datadog-windows
    meta.helm.sh/release-namespace: datadog
  creationTimestamp: "2023-08-24T09:26:13Z"
  generation: 1
  labels:
    app.kubernetes.io/component: agent
    app.kubernetes.io/instance: datadog-windows
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: datadog-windows
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-3.33.9
  name: datadog-windows
  namespace: datadog
  resourceVersion: "9428892"
  uid: 838bb16f-b42e-4bd5-a0d5-2ac163aa274e
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: datadog-windows
  template:
    metadata:
      annotations:
        checksum/api_key: b905c681a226151785c93fdbd983fb591e88b4d37aa6e06c71f418dbcecaf170
        checksum/autoconf-config: 74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b
        checksum/checksd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
        checksum/clusteragent_token: 75a11da44c802486bc6f65640aa48a730f0f684c5c07a42ba3cd1735eb3fb070
        checksum/confd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
        checksum/install_info: 8063a59eea4a33498c05115f3bf226dd6aff9fd72e48ea86d7d3b6ef693d1224
      creationTimestamp: null
      labels:
        app: datadog-windows
        app.kubernetes.io/component: agent
        app.kubernetes.io/instance: datadog-windows
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: datadog-windows
      name: datadog-windows
    spec:
      affinity: {}
      automountServiceAccountToken: true
      containers:
      - command:
        - agent
        - run
        env:
        - name: GODEBUG
          value: x509ignoreCN=0
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              key: api-key
              name: datadog-windows
        - name: DD_REMOTE_CONFIGURATION_ENABLED
          value: "false"
        - name: DD_AUTH_TOKEN_FILE_PATH
          value: C:/ProgramData/Datadog/auth/token
        - name: DD_CLUSTER_NAME
          value: crossplane
        - name: KUBERNETES
          value: "yes"
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: DD_LOG_LEVEL
          value: INFO
        - name: DD_DOGSTATSD_PORT
          value: "8125"
        - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
          value: "true"
        - name: DD_DOGSTATSD_TAG_CARDINALITY
          value: low
        - name: DD_CLUSTER_AGENT_ENABLED
          value: "true"
        - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
          value: datadog-cluster-agent
        - name: DD_CLUSTER_AGENT_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: datadog-cluster-agent
        - name: DD_APM_ENABLED
          value: "false"
        - name: DD_LOGS_ENABLED
          value: "true"
        - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
          value: "false"
        - name: DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE
          value: "true"
        - name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION
          value: "false"
        - name: DD_HEALTH_PORT
          value: "5555"
        - name: DD_EXTRA_CONFIG_PROVIDERS
          value: clusterchecks endpointschecks
        - name: DD_IGNORE_AUTOCONF
          value: kubernetes_state
        - name: DD_EXPVAR_PORT
          value: "6000"
        - name: DD_COMPLIANCE_CONFIG_ENABLED
          value: "false"
        image: gcr.io/datadoghq/agent:7.46.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /live
            port: 5555
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 5
        name: agent
        ports:
        - containerPort: 8125
          hostPort: 8125
          name: dogstatsdport
          protocol: UDP
        readinessProbe:
          failureThreshold: 6
          httpGet:
            path: /ready
            port: 5555
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: C:/ProgramData/Datadog/logs
          name: logdatadog
        - mountPath: C:/ProgramData/Datadog
          name: config
        - mountPath: C:/ProgramData/Datadog/auth
          name: auth-token
        - mountPath: \\.\pipe\docker_engine
          name: runtimesocket
        - mountPath: \\.\pipe\containerd-containerd
          name: containerdsocket
        - mountPath: c:/programdata/datadog/run
          name: pointerdir
        - mountPath: C:/var/log/pods
          name: logpodpath
          readOnly: true
        - mountPath: C:/ProgramData
          name: logdockercontainerpath
          readOnly: true
      - command:
        - trace-agent
        - -foreground
        - -config=C:/ProgramData/Datadog/datadog.yaml
        env:
        - name: GODEBUG
          value: x509ignoreCN=0
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              key: api-key
              name: datadog-windows
        - name: DD_REMOTE_CONFIGURATION_ENABLED
          value: "false"
        - name: DD_AUTH_TOKEN_FILE_PATH
          value: C:/ProgramData/Datadog/auth/token
        - name: DD_CLUSTER_NAME
          value: crossplane
        - name: KUBERNETES
          value: "yes"
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: DD_CLUSTER_AGENT_ENABLED
          value: "true"
        - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
          value: datadog-cluster-agent
        - name: DD_CLUSTER_AGENT_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: datadog-cluster-agent
        - name: DD_LOG_LEVEL
          value: INFO
        - name: DD_APM_ENABLED
          value: "true"
        - name: DD_APM_NON_LOCAL_TRAFFIC
          value: "true"
        - name: DD_APM_RECEIVER_PORT
          value: "8126"
        image: gcr.io/datadoghq/agent:7.46.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 15
          periodSeconds: 15
          successThreshold: 1
          tcpSocket:
            port: 8126
          timeoutSeconds: 5
        name: trace-agent
        ports:
        - containerPort: 8126
          hostPort: 8126
          name: traceport
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: C:/ProgramData/Datadog
          name: config
          readOnly: true
        - mountPath: C:/ProgramData/Datadog/logs
          name: logdatadog
        - mountPath: C:/ProgramData/Datadog/auth
          name: auth-token
          readOnly: true
        - mountPath: \\.\pipe\docker_engine
          name: runtimesocket
        - mountPath: \\.\pipe\containerd-containerd
          name: containerdsocket
      - command:
        - process-agent
        - -foreground
        - --config=C:/ProgramData/Datadog/datadog.yaml
        env:
        - name: GODEBUG
          value: x509ignoreCN=0
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              key: api-key
              name: datadog-windows
        - name: DD_REMOTE_CONFIGURATION_ENABLED
          value: "false"
        - name: DD_AUTH_TOKEN_FILE_PATH
          value: C:/ProgramData/Datadog/auth/token
        - name: DD_CLUSTER_NAME
          value: crossplane
        - name: KUBERNETES
          value: "yes"
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: DD_CLUSTER_AGENT_ENABLED
          value: "true"
        - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
          value: datadog-cluster-agent
        - name: DD_CLUSTER_AGENT_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: datadog-cluster-agent
        - name: DD_PROCESS_AGENT_DISCOVERY_ENABLED
          value: "true"
        - name: DD_LOG_LEVEL
          value: INFO
        - name: DD_SYSTEM_PROBE_ENABLED
          value: "false"
        - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
          value: "true"
        image: gcr.io/datadoghq/agent:7.46.0
        imagePullPolicy: IfNotPresent
        name: process-agent
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: C:/ProgramData/Datadog
          name: config
          readOnly: true
        - mountPath: C:/ProgramData/Datadog/logs
          name: logdatadog
        - mountPath: \\.\pipe\docker_engine
          name: runtimesocket
        - mountPath: \\.\pipe\containerd-containerd
          name: containerdsocket
      dnsPolicy: ClusterFirst
      initContainers:
      - args:
        - |
          Copy-Item -Recurse -Force C:/ProgramData/Datadog C:/Temp
          Copy-Item -Force C:/Temp/install_info/install_info C:/Temp/Datadog/install_info
        command:
        - pwsh
        - -Command
        image: gcr.io/datadoghq/agent:7.46.0
        imagePullPolicy: IfNotPresent
        name: init-volume
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: C:/Temp/Datadog
          name: config
        - mountPath: C:/Temp/install_info
          name: installinfo
          readOnly: true
      - args:
        - Get-ChildItem 'entrypoint-ps1' | ForEach-Object { & $_.FullName if (-Not
          $?) { exit 1 } }
        command:
        - pwsh
        - -Command
        env:
        - name: GODEBUG
          value: x509ignoreCN=0
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              key: api-key
              name: datadog-windows
        - name: DD_REMOTE_CONFIGURATION_ENABLED
          value: "false"
        - name: DD_AUTH_TOKEN_FILE_PATH
          value: C:/ProgramData/Datadog/auth/token
        - name: DD_CLUSTER_NAME
          value: crossplane
        - name: KUBERNETES
          value: "yes"
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: gcr.io/datadoghq/agent:7.46.0
        imagePullPolicy: IfNotPresent
        name: init-config
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: C:/ProgramData/Datadog
          name: config
        - mountPath: \\.\pipe\docker_engine
          name: runtimesocket
        - mountPath: \\.\pipe\containerd-containerd
          name: containerdsocket
      nodeSelector:
        kubernetes.io/os: windows
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: datadog-windows
      serviceAccountName: datadog-windows
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node.kubernetes.io/os
        operator: Equal
        value: windows
      - effect: NoSchedule
        key: os
        operator: Equal
        value: windows
      volumes:
      - emptyDir: {}
        name: auth-token
      - configMap:
          defaultMode: 420
          name: datadog-windows-installinfo
        name: installinfo
      - emptyDir: {}
        name: config
      - hostPath:
          path: C:/var/log
          type: ""
        name: pointerdir
      - hostPath:
          path: C:/var/log/pods
          type: ""
        name: logpodpath
      - hostPath:
          path: C:/ProgramData
          type: ""
        name: logdockercontainerpath
      - hostPath:
          path: \\.\pipe\docker_engine
          type: ""
        name: runtimesocket
      - hostPath:
          path: \\.\pipe\containerd-containerd
          type: ""
        name: containerdsocket
      - emptyDir: {}
        name: logdatadog
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 10%
    type: RollingUpdate

Pod Manifest

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/api_key: b905c681a226151785c93fdbd983fb591e88b4d37aa6e06c71f418dbcecaf170
    checksum/autoconf-config: 74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b   
    checksum/checksd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a    
    checksum/clusteragent_token: 75a11da44c802486bc6f65640aa48a730f0f684c5c07a42ba3cd1735eb3fb070
    checksum/confd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a      
    checksum/install_info: 8063a59eea4a33498c05115f3bf226dd6aff9fd72e48ea86d7d3b6ef693d1224      
    vpc.amazonaws.com/PrivateIPv4Address: 10.188.213.245/24
  creationTimestamp: "2023-08-24T09:26:13Z"
  generateName: datadog-windows-
  labels:
    app: datadog-windows
    app.kubernetes.io/component: agent
    app.kubernetes.io/instance: datadog-windows
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: datadog-windows
    controller-revision-hash: b888bb589
    pod-template-generation: "1"
  name: datadog-windows-qgqg9
  namespace: datadog
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: datadog-windows
    uid: 838bb16f-b42e-4bd5-a0d5-2ac163aa274e
  resourceVersion: "10032017"
  uid: 28f88d38-6e3c-47ba-b00f-a6e5a45c8cf2
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - ip-10-188-213-150.us-east-2.compute.internal
  automountServiceAccountToken: true
  containers:
  - command:
    - agent
    - run
    env:
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: DD_API_KEY
      valueFrom:
        secretKeyRef:
          key: api-key
          name: datadog-windows
    - name: DD_REMOTE_CONFIGURATION_ENABLED
      value: "false"
    - name: DD_AUTH_TOKEN_FILE_PATH
      value: C:/ProgramData/Datadog/auth/token
    - name: DD_CLUSTER_NAME
      value: crossplane
    - name: KUBERNETES
      value: "yes"
    - name: DD_KUBERNETES_KUBELET_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: DD_LOG_LEVEL
      value: INFO
    - name: DD_DOGSTATSD_PORT
      value: "8125"
    - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
      value: "true"
    - name: DD_DOGSTATSD_TAG_CARDINALITY
      value: low
    - name: DD_CLUSTER_AGENT_ENABLED
      value: "true"
    - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
      value: datadog-cluster-agent
    - name: DD_CLUSTER_AGENT_AUTH_TOKEN
      valueFrom:
        secretKeyRef:
          key: token
          name: datadog-cluster-agent
    - name: DD_APM_ENABLED
      value: "false"
    - name: DD_LOGS_ENABLED
      value: "true"
    - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
      value: "false"
    - name: DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE
      value: "true"
    - name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION
      value: "false"
    - name: DD_HEALTH_PORT
      value: "5555"
    - name: DD_EXTRA_CONFIG_PROVIDERS
      value: clusterchecks endpointschecks
    - name: DD_IGNORE_AUTOCONF
      value: kubernetes_state
    - name: DD_EXPVAR_PORT
      value: "6000"
    - name: DD_COMPLIANCE_CONFIG_ENABLED
      value: "false"
    image: gcr.io/datadoghq/agent:7.46.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 6
      httpGet:
        path: /live
        port: 5555
        scheme: HTTP
      initialDelaySeconds: 15
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 5
    name: agent
    ports:
    - containerPort: 8125
      hostPort: 8125
      name: dogstatsdport
      protocol: UDP
    readinessProbe:
      failureThreshold: 6
      httpGet:
        path: /ready
        port: 5555
        scheme: HTTP
      initialDelaySeconds: 15
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        vpc.amazonaws.com/PrivateIPv4Address: "1"
      requests:
        vpc.amazonaws.com/PrivateIPv4Address: "1"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: C:/ProgramData/Datadog/logs
      name: logdatadog
    - mountPath: C:/ProgramData/Datadog
      name: config
    - mountPath: C:/ProgramData/Datadog/auth
      name: auth-token
    - mountPath: \\.\pipe\docker_engine
      name: runtimesocket
    - mountPath: \\.\pipe\containerd-containerd
      name: containerdsocket
    - mountPath: c:/programdata/datadog/run
      name: pointerdir
    - mountPath: C:/var/log/pods
      name: logpodpath
      readOnly: true
    - mountPath: C:/ProgramData
      name: logdockercontainerpath
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-59fw4
      readOnly: true
  - command:
    - trace-agent
    - -foreground
    - -config=C:/ProgramData/Datadog/datadog.yaml
    env:
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: DD_API_KEY
      valueFrom:
        secretKeyRef:
          key: api-key
          name: datadog-windows
    - name: DD_REMOTE_CONFIGURATION_ENABLED
      value: "false"
    - name: DD_AUTH_TOKEN_FILE_PATH
      value: C:/ProgramData/Datadog/auth/token
    - name: DD_CLUSTER_NAME
      value: crossplane
    - name: KUBERNETES
      value: "yes"
    - name: DD_KUBERNETES_KUBELET_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: DD_CLUSTER_AGENT_ENABLED
      value: "true"
    - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
      value: datadog-cluster-agent
    - name: DD_CLUSTER_AGENT_AUTH_TOKEN
      valueFrom:
        secretKeyRef:
          key: token
          name: datadog-cluster-agent
    - name: DD_LOG_LEVEL
      value: INFO
    - name: DD_APM_ENABLED
      value: "true"
    - name: DD_APM_NON_LOCAL_TRAFFIC
      value: "true"
    - name: DD_APM_RECEIVER_PORT
      value: "8126"
    image: gcr.io/datadoghq/agent:7.46.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 15
      periodSeconds: 15
      successThreshold: 1
      tcpSocket:
        port: 8126
      timeoutSeconds: 5
    name: trace-agent
    ports:
    - containerPort: 8126
      hostPort: 8126
      name: traceport
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: C:/ProgramData/Datadog
      name: config
      readOnly: true
    - mountPath: C:/ProgramData/Datadog/logs
      name: logdatadog
    - mountPath: C:/ProgramData/Datadog/auth
      name: auth-token
      readOnly: true
    - mountPath: \\.\pipe\docker_engine
      name: runtimesocket
    - mountPath: \\.\pipe\containerd-containerd
      name: containerdsocket
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-59fw4
      readOnly: true
  - command:
    - process-agent
    - -foreground
    - --config=C:/ProgramData/Datadog/datadog.yaml
    env:
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: DD_API_KEY
      valueFrom:
        secretKeyRef:
          key: api-key
          name: datadog-windows
    - name: DD_REMOTE_CONFIGURATION_ENABLED
      value: "false"
    - name: DD_AUTH_TOKEN_FILE_PATH
      value: C:/ProgramData/Datadog/auth/token
    - name: DD_CLUSTER_NAME
      value: crossplane
    - name: KUBERNETES
      value: "yes"
    - name: DD_KUBERNETES_KUBELET_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: DD_CLUSTER_AGENT_ENABLED
      value: "true"
    - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
      value: datadog-cluster-agent
    - name: DD_CLUSTER_AGENT_AUTH_TOKEN
      valueFrom:
        secretKeyRef:
          key: token
          name: datadog-cluster-agent
    - name: DD_PROCESS_AGENT_DISCOVERY_ENABLED
      value: "true"
    - name: DD_LOG_LEVEL
      value: INFO
    - name: DD_SYSTEM_PROBE_ENABLED
      value: "false"
    - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
      value: "true"
    image: gcr.io/datadoghq/agent:7.46.0
    imagePullPolicy: IfNotPresent
    name: process-agent
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: C:/ProgramData/Datadog
      name: config
      readOnly: true
    - mountPath: C:/ProgramData/Datadog/logs
      name: logdatadog
    - mountPath: \\.\pipe\docker_engine
      name: runtimesocket
    - mountPath: \\.\pipe\containerd-containerd
      name: containerdsocket
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-59fw4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - |
      Copy-Item -Recurse -Force C:/ProgramData/Datadog C:/Temp
      Copy-Item -Force C:/Temp/install_info/install_info C:/Temp/Datadog/install_info
    command:
    - pwsh
    - -Command
    image: gcr.io/datadoghq/agent:7.46.0
    imagePullPolicy: IfNotPresent
    name: init-volume
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: C:/Temp/Datadog
      name: config
    - mountPath: C:/Temp/install_info
      name: installinfo
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-59fw4
      readOnly: true
  - args:
    - Get-ChildItem 'entrypoint-ps1' | ForEach-Object { & $_.FullName if (-Not $?)
      { exit 1 } }
    command:
    - pwsh
    - -Command
    env:
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: DD_API_KEY
      valueFrom:
        secretKeyRef:
          key: api-key
          name: datadog-windows
    - name: DD_REMOTE_CONFIGURATION_ENABLED
      value: "false"
    - name: DD_AUTH_TOKEN_FILE_PATH
      value: C:/ProgramData/Datadog/auth/token
    - name: DD_CLUSTER_NAME
      value: crossplane
    - name: KUBERNETES
      value: "yes"
    - name: DD_KUBERNETES_KUBELET_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    image: gcr.io/datadoghq/agent:7.46.0
    imagePullPolicy: IfNotPresent
    name: init-config
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: C:/ProgramData/Datadog
      name: config
    - mountPath: \\.\pipe\docker_engine
      name: runtimesocket
    - mountPath: \\.\pipe\containerd-containerd
      name: containerdsocket
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-59fw4
      readOnly: true
  nodeName: ip-10-188-213-150.us-east-2.compute.internal
  nodeSelector:
    kubernetes.io/os: windows
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: datadog-windows
  serviceAccountName: datadog-windows
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/os
    operator: Equal
    value: windows
  - effect: NoSchedule
    key: os
    operator: Equal
    value: windows
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: vpc.amazonaws.com/PrivateIPv4Address
    operator: Exists
  volumes:
  - emptyDir: {}
    name: auth-token
  - configMap:
      defaultMode: 420
      name: datadog-windows-installinfo
    name: installinfo
  - emptyDir: {}
    name: config
  - hostPath:
      path: C:/var/log
      type: ""
    name: pointerdir
  - hostPath:
      path: C:/var/log/pods
      type: ""
    name: logpodpath
  - hostPath:
      path: C:/ProgramData
      type: ""
    name: logdockercontainerpath
  - hostPath:
      path: \\.\pipe\docker_engine
      type: ""
    name: runtimesocket
  - hostPath:
      path: \\.\pipe\containerd-containerd
      type: ""
    name: containerdsocket
  - emptyDir: {}
    name: logdatadog
  - name: kube-api-access-59fw4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
levan-m commented 1 year ago

Hello,

Thanks for providing the manifests. I tried reproducing the issue but didn't encounter same error.

This is my environment:

I used Helm values provided in the description, without existingClusterAgent

existingClusterAgent:
  join: true
  serviceName: datadog-cluster-agent
  tokenSecretName: datadog-cluster-agent

Agents on all four nodes reached running state (DCA below is pending due to scheduling problem I haven't looked into)

NAME                                     READY   STATUS    RESTARTS   AGE
datadog-5cxhs                            3/3     Running   0          24m
datadog-c67w4                            3/3     Running   0          24m
datadog-cluster-agent-67d7c6cdbf-sxljp   0/1     Pending   0          24m
datadog-v7twp                            3/3     Running   0          24m
datadog-z9pzx                            3/3     Running   0          24m

I compared daemonset and pod manifests and all diffs are due to difference in cluster names or various IDs/hashes/checksums.

Could you please provide more information about your environment and setup?

sujith1594 commented 1 year ago

Hi @levan-m

Below are my environment details:

Kernel Version:             10.0.17763.4645
OS Image:                   Windows Server 2019 Datacenter
Operating System:           windows
Architecture:               amd64
Container Runtime Version:  containerd://1.6.6

I'm running an EKS Cluster with both Linux and Windows Workloads, SO I deployed the Agent twice one for linux and one for windows.

Linux Values File:

datadog:
  apiKey: <api_key>
  appKey: <app_key>
  clusterName: <cluster_name>
  dogstatsd:
    useHostPort: true
  logs:
    enabled: true
  apm:
    portEnabled: true
clusterAgent: 
  image:
    doNotCheckTag: true
agents:
  priorityClassName: system-node-critical
  image:
    doNotCheckTag: true 

Windows Values file is already there in the case. I redeployed with "3.35.0" version of Helm Chart. still seeing the same issue. Same Configuration worked fine in EKS 1.23. After upgrading the cluster to 1.25 we have started seeing this issue. Linux Agent works fine only issue with windows agent.

Thanks, Sujith.