elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
26 stars 438 forks source link

[Kubernetes Integration] Some metrics stopped to be reported despite agents healthy and running #7137

Open Danouchka opened 1 year ago

Danouchka commented 1 year ago

Introduction I have deployed elastic-agent 8.8.2 on GKE (3 nodes K8s cluster). Data is sent to an Elastic Cloud Cluster in 8.8.2

Issues The elastic-agents have been working properly for 12 days until they do not report any kube state metrics from July 22th as you can see on attached picture

Capture d’écran 2023-07-25 à 13 56 58

And yet the agents are running properly

More kubectl get pods -n kube-system -o wide | grep elastic-agent

elastic-agent-bv52w 1/1 Running 0 11d 10.132.0.27 gke-sa-da-gke-cluster-pool-1-99abf297-fipd elastic-agent-drxl5 1/1 Running 0 11d 10.132.0.28 gke-sa-da-gke-cluster-pool-1-99abf297-m7s0 elastic-agent-jsjpz 1/1 Running 0 11d 10.132.0.29 gke-sa-da-gke-cluster-pool-1-99abf297-ltuh

kubectl describe pod elastic-agent-jsjpz -n kube-system

Name: elastic-agent-jsjpz Namespace: kube-system Priority: 0 Service Account: elastic-agent Node: gke-sa-da-gke-cluster-pool-1-99abf297-ltuh/10.132.0.29 Start Time: Thu, 13 Jul 2023 18:51:29 +0000 Labels: app=elastic-agent controller-revision-hash=56659fcf6f pod-template-generation=1 Annotations: Status: Running IP: 10.132.0.29 IPs: IP: 10.132.0.29 Controlled By: DaemonSet/elastic-agent Containers: elastic-agent: Container ID: containerd://19bec7ae976ed427c8fdb9ea444bc60690378ce4b615ce14c23d80f517daa012 Image: docker.elastic.co/beats/elastic-agent:8.8.2 Image ID: docker.elastic.co/beats/elastic-agent@sha256:592a61e7b97141cb948e64c29cf83deba548b9fffea5e2e7883984b4320a27b5 Port: Host Port: State: Running Started: Thu, 13 Jul 2023 18:52:06 +0000 Ready: True Restart Count: 0 Limits: memory: 1500Mi Requests: cpu: 700m memory: 700Mi Environment: FLEET_ENROLL: 1 FLEET_INSECURE: true FLEET_URL: https://b9cc062431d64ac0a02edf3ea543dd0a.fleet.europe-west1.gcp.cloud.es.io:443 FLEET_ENROLLMENT_TOKEN: KIBANA_HOST: http://kibana:5601 KIBANA_FLEET_USERNAME: elastic KIBANA_FLEET_PASSWORD: changeme NODE_NAME: (v1:spec.nodeName) POD_NAME: elastic-agent-jsjpz (v1:metadata.name) Mounts: /etc/machine-id from etc-mid (ro) /hostfs/etc from etc-full (ro) /hostfs/proc from proc (ro) /hostfs/sys/fs/cgroup from cgroup (ro) /hostfs/var/lib from var-lib (ro) /sys/kernel/debug from sys-kernel-debug (rw) /var/lib/docker/containers from varlibdockercontainers (ro) /var/log from varlog (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z8j6r (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: proc: Type: HostPath (bare host directory volume) Path: /proc HostPathType:
cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup HostPathType:
varlibdockercontainers: Type: HostPath (bare host directory volume) Path: /var/lib/docker/containers HostPathType:
varlog: Type: HostPath (bare host directory volume) Path: /var/log HostPathType:
etc-full: Type: HostPath (bare host directory volume) Path: /etc HostPathType:
var-lib: Type: HostPath (bare host directory volume) Path: /var/lib HostPathType:
etc-mid: Type: HostPath (bare host directory volume) Path: /etc/machine-id HostPathType: File sys-kernel-debug: Type: HostPath (bare host directory volume) Path: /sys/kernel/debug HostPathType:
kube-api-access-z8j6r: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events:

Error Logs found in the leader elastic-agent

logs.txt

elasticmachine commented 1 year ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine commented 1 year ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Danouchka commented 1 year ago

cc @pierrehilbert @michalpristas @MichaelKatsoulis

lduvnjak commented 8 months ago

Just happened to us after it was collecting logs for two days without issues, currently on version 8.12.0. We can't find any errors in elastic_agent logs related to the integration. The only thing I could find after trying to visualize the data in discover is this:

image

We run all of our data through Logstash, and there are a lot of 409 document exists errors:

[2024-02-16T13:16:49,184][WARN ][logstash.outputs.elasticsearch][agent-output][51713b08b805618714c41083aaaead687372b9e1e35b7aef707c1fe6b8b76bdf] Failed action {:status=>409, :action=>["create", {:_id=>nil, :_index=>"metrics-elastic_agent.metricbeat-default", :routing=>nil}, {"metricset"=>{"period"=>60000, "name"=>"stats"}, "beat"=>{"type"=>"metricbeat", "stats"=>{"memstats"=>{"memory"=>{"alloc"=>58738408, "total"=>12020136576}, "rss"=>175525888, "gc_next"=>63175968}, "cgroup"=>{"memory"=>{"id"=>"/", "mem"=>{"usage"=>{"bytes"=>0}}}, "cpu"=>{"stats"=>{"periods"=>0, "throttled"=>{"ns"=>0, "periods"=>0}}, "id"=>"/"}}, "beat"=>{"type"=>"metricbeat", "name"=>"...", "host"=>"...", "version"=>"8.12.0", "uuid"=>"b0144616-4629-4c2f-bffb-dd01592ada54"}, "handles"=>{"limit"=>{"hard"=>65535, "soft"=>65535}, "open"=>25}, "libbeat"=>{"pipeline"=>{"events"=>{"active"=>90, "filtered"=>0, "published"=>151339, "retry"=>94, "failed"=>0, "dropped"=>0, "total"=>151339}, "clients"=>8, "queue"=>{"max_events"=>3200, "acked"=>151249}}, "output"=>{"type"=>"logstash", "write"=>{"bytes"=>40875563, "errors"=>0}, "read"=>{"bytes"=>95813, "errors"=>0}, "events"=>{"toomany"=>0, "duplicates"=>0, "active"=>0, "acked"=>151249, "batches"=>1499, "failed"=>0, "dropped"=>0, "total"=>151249}}, "config"=>{"stops"=>0, "reloads"=>0, "starts"=>8, "running"=>8}}, "uptime"=>{"ms"=>16028098}, "cpu"=>{"user"=>{"ticks"=>68410, "time"=>{"ms"=>68410}}, "system"=>{"ticks"=>10160, "time"=>{"ms"=>10160}}, "total"=>{"value"=>78570, "ticks"=>78570, "time"=>{"ms"=>78570}}}, "info"=>{"name"=>"metricbeat", "ephemeral_id"=>"e8fe6d37-4e5a-46c8-9283-be00e172a256", "version"=>"8.12.0", "uptime"=>{"ms"=>16028098.0}}, "system"=>{"load"=>{"5"=>0.27, "1"=>0.08, "15"=>0.33, "norm"=>{"5"=>0.0675, "1"=>0.02, "15"=>0.0825}}, "cpu"=>{"cores"=>4}}, "runtime"=>{"goroutines"=>206}}, "id"=>"b0144616-4629-4c2f-bffb-dd01592ada54"}, "component"=>{"binary"=>"metricbeat", "id"=>"kubernetes/metrics-default"}, "data_stream"=>{"namespace"=>"default", "type"=>"metrics", "dataset"=>"elastic_agent.metricbeat"}, "agent"=>{"type"=>"metricbeat", "name"=>"...", "ephemeral_id"=>"8ead65ca-daab-4521-bc69-20b4c8d939b4", "version"=>"8.12.0", "id"=>"082a9af8-aeeb-431c-8b1b-96b94c7f8859"}, "@timestamp"=>2024-02-16T13:16:37.764Z, "hour_of_day"=>13, "event"=>{"duration"=>9927177, "dataset"=>"elastic_agent.metricbeat", "module"=>"beat"}, "host"=>{"containerized"=>false, "mac"=>["00-50-56-BD-7A-F8", "5E-4E-FF-C5-CE-23", "5E-5B-2E-E0-C7-12", "66-37-88-36-F7-CF", "EE-EE-EE-EE-EE-EE"], "os"=>{"kernel"=>"5.14.0-362.13.1.el9_3.x86_64", "version"=>"20.04.6 LTS (Focal Fossa)", "codename"=>"focal", "family"=>"debian", "name"=>"Ubuntu", "type"=>"linux", "platform"=>"ubuntu"}, "hostname"=>"...", "ip"=>["...", "...", "10.233.0.1", "10.233.0.3", "10.233.14.6", "10.233.18.165", "10.233.57.38", "10.233.7.125", "10.233.26.50", "10.233.17.45", "10.233.34.231", "10.233.53.96", "10.233.43.155", "10.233.17.115", "10.233.59.213", "10.233.4.227", "10.233.10.83", "10.233.68.128", "fe80::6437:88ff:fe36:f7cf", "fe80::ecee:eeff:feee:eeee", "fe80::ecee:eeff:feee:eeee", "169.254.25.10", "fe80::ecee:eeff:feee:eeee", "fe80::ecee:eeff:feee:eeee"], "architecture"=>"x86_64", "name"=>"...", "id"=>"79937c83f15e4cb4a8e8f6463abd8e10"}, "elastic_agent"=>{"process"=>"metricbeat", "version"=>"8.12.0", "snapshot"=>false, "id"=>"082a9af8-aeeb-431c-8b1b-96b94c7f8859"}, "ecs"=>{"version"=>"8.0.0"}, "service"=>{"type"=>"beat", "name"=>"beat", "address"=>"http://unix/stats"}, "type"=>"elastic-agent", "@version"=>"1", "day_of_week"=>"Fri"}], :response=>{"create"=>{"status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[6yrf-6dBtJHffQNeAAABjbIP0gQ][{agent.id=082a9af8-aeeb-431c-8b1b-96b94c7f8859, component.id=kubernetes/metrics-default, metricset.name=stats}@2024-02-16T13:16:37.764Z]: version conflict, document already exists (current version [1])", "index_uuid"=>"Ucyt3jzdRYq5nFX369-TPg", "shard"=>"0", "index"=>".ds-metrics-elastic_agent.metricbeat-default-2024.02.14-000001"}}}}

Does that mean the Kubernetes integration is not supported over Logstash? I know for example Synthetics isn't, so I'd expect it to be documented. Seeing as the above bug report has the same issue as us, maybe it's not due to Logstash.

Here's an example visualization that stopped working randomly:

image

Here's the integration:

PUT kbn:/api/fleet/package_policies/8204a986-5128-4225-a5b6-f021905b7fd2
{
  "package": {
    "name": "kubernetes",
    "version": "1.56.0"
  },
  "name": "kubernetes-1",
  "namespace": "default",
  "description": "",
  "policy_id": "eck-agent",
  "vars": {},
  "inputs": {
    "kubelet-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.container": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10250"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "add_resource_metadata_config": "# add_resource_metadata:\n#   namespace:\n#     include_labels: [\"namespacelabel1\"]\n#   node:\n#     include_labels: [\"nodelabel2\"]\n#     include_annotations: [\"nodeannotation1\"]\n#   deployment: false\n",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.node": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10250"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.pod": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10250"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "ssl.certificate_authorities": [],
            "add_resource_metadata_config": "# add_resource_metadata:\n#   namespace:\n#     include_labels: [\"namespacelabel1\"]\n#   node:\n#     include_labels: [\"nodelabel2\"]\n#     include_annotations: [\"nodeannotation1\"]\n#   deployment: false\n"
          }
        },
        "kubernetes.system": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10250"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.volume": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10250"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "ssl.certificate_authorities": []
          }
        }
      }
    },
    "kube-state-metrics-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.state_container": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": [],
            "add_resource_metadata_config": "# add_resource_metadata:\n#   namespace:\n#     include_labels: [\"namespacelabel1\"]\n#   node:\n#     include_labels: [\"nodelabel2\"]\n#     include_annotations: [\"nodeannotation1\"]\n#   deployment: false\n"
          }
        },
        "kubernetes.state_cronjob": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_daemonset": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_deployment": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_job": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_namespace": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_node": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_persistentvolume": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_persistentvolumeclaim": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_pod": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": [],
            "add_resource_metadata_config": "# add_resource_metadata:\n#   namespace:\n#     include_labels: [\"namespacelabel1\"]\n#   node:\n#     include_labels: [\"nodelabel2\"]\n#     include_annotations: [\"nodeannotation1\"]\n#   deployment: false\n"
          }
        },
        "kubernetes.state_replicaset": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_resourcequota": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_service": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_statefulset": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        },
        "kubernetes.state_storageclass": {
          "enabled": true,
          "vars": {
            "add_metadata": true,
            "hosts": [
              "kube-state-metrics.kube-system.svc:8080"
            ],
            "leaderelection": true,
            "period": "10s",
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "ssl.certificate_authorities": []
          }
        }
      }
    },
    "kube-apiserver-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.apiserver": {
          "enabled": true,
          "vars": {
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.KUBERNETES_SERVICE_HOST}:${env.KUBERNETES_SERVICE_PORT}"
            ],
            "leaderelection": true,
            "period": "30s",
            "ssl.certificate_authorities": [
              "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
            ]
          }
        }
      }
    },
    "kube-proxy-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.proxy": {
          "enabled": true,
          "vars": {
            "hosts": [
              "${env.NODE_NAME}:10249"
            ],
            "period": "10s"
          }
        }
      }
    },
    "kube-scheduler-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.scheduler": {
          "enabled": true,
          "vars": {
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10259"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "scheduler_label_key": "component",
            "scheduler_label_value": "kube-scheduler"
          }
        }
      }
    },
    "kube-controller-manager-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.controllermanager": {
          "enabled": true,
          "vars": {
            "bearer_token_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
            "hosts": [
              "https://${env.NODE_NAME}:10257"
            ],
            "period": "10s",
            "ssl.verification_mode": "none",
            "controller_manager_label_key": "component",
            "controller_manager_label_value": "kube-controller-manager"
          }
        }
      }
    },
    "events-kubernetes/metrics": {
      "enabled": true,
      "streams": {
        "kubernetes.event": {
          "enabled": true,
          "vars": {
            "period": "10s",
            "add_metadata": true,
            "skip_older": true,
            "leaderelection": true
          }
        }
      }
    },
    "container-logs-filestream": {
      "enabled": true,
      "streams": {
        "kubernetes.container_logs": {
          "enabled": true,
          "vars": {
            "paths": [
              "/var/log/containers/*${kubernetes.container.id}.log"
            ],
            "symlinks": true,
            "data_stream.dataset": "kubernetes.container_logs",
            "containerParserStream": "all",
            "containerParserFormat": "auto",
            "condition": "${kubernetes.labels.app.kubernetes.io/name} != 'eck' and ${kubernetes.labels.app.kubernetes.io/name} != 'ingress-nginx' ",
            "additionalParsersConfig": "# - ndjson:\n#     target: json\n#     ignore_decoding_error: true\n# - multiline:\n#     type: pattern\n#     pattern: '^\\['\n#     negate: true\n#     match: after\n",
            "custom": ""
          }
        }
      }
    },
    "audit-logs-filestream": {
      "enabled": true,
      "streams": {
        "kubernetes.audit_logs": {
          "enabled": true,
          "vars": {
            "paths": [
              "/var/log/kubernetes/kube-apiserver-audit.log"
            ]
          }
        }
      }
    }
  }
}

Here's our elastic-agent.yaml

---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
  namespace: default
spec:
  version: 8.12.0
  kibanaRef:
    name: kibana
  elasticsearchRefs:
  - name: elastic
  mode: fleet
  fleetServerEnabled: true
  policyID: eck-fleet-server
  deployment:
    replicas: 1
    podTemplate:
      metadata:
        labels:
          app.kubernetes.io/name: "eck"
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
        volumes:
        - name: agent-data
          emptyDir: {}
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent
  namespace: default
spec:
  version: 8.12.0
  kibanaRef:
    name: kibana
  fleetServerRef:
    name: fleet-server
  mode: fleet
  policyID: eck-agent
  daemonSet:
    podTemplate:
      metadata:
        labels:
          app.kubernetes.io/name: "eck"
      spec:
        tolerations:
        - key: "node-role.kubernetes.io/control-plane"
          operator: "Exists"
          effect: "NoSchedule"
        serviceAccountName: elastic-agent
        hostNetwork: true
        hostPID: true
        dnsPolicy: ClusterFirstWithHostNet
        automountServiceAccountToken: true
        containers:
        - name: agent
          env:
            - name: FLEET_INSECURE
              value: "false"
            - name: FLEET_URL
              value: "https://fleet....:443"
            - name: FLEET_CA
              value: "/ssl/ca.crt"
          volumeMounts:
            - name: proc
              mountPath: /hostfs/proc
              readOnly: true
            - name: cgroup
              mountPath: /hostfs/sys/fs/cgroup
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: etc-full
              mountPath: /hostfs/etc
              readOnly: true
            - name: var-lib
              mountPath: /hostfs/var/lib
              readOnly: true
            - name: etc-mid
              mountPath: /etc/machine-id
              readOnly: true
            - name: sys-kernel-debug
              mountPath: /sys/kernel/debug
            - name: fleet-server-agent-http-ca-external
              mountPath: "/ssl"
        securityContext:
          runAsUser: 0
        volumes:
          - name: proc
            hostPath:
              path: /proc
          - name: cgroup
            hostPath:
              path: /sys/fs/cgroup
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
          - name: varlog
            hostPath:
              path: /var/log
          - name: etc-full
            hostPath:
              path: /etc
          - name: var-lib
            hostPath:
              path: /var/lib
          - name: etc-mid
            hostPath:
              path: /etc/machine-id
              type: File
          - name: sys-kernel-debug
            hostPath:
              path: /sys/kernel/debug
          - name: fleet-server-agent-http-ca-external
            secret:
              secretName: fleet-server-agent-http-ca-external
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: default
roleRef:
  kind: ClusterRole
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: default
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: default
roleRef:
  kind: Role
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: elastic-agent-kubeadm-config
  namespace: default
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: default
roleRef:
  kind: Role
  name: elastic-agent-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-agent
  labels:
    app: eck
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - namespaces
      - events
      - pods
      - services
      - configmaps
      # Needed for cloudbeat
      - serviceaccounts
      - persistentvolumes
      - persistentvolumeclaims
    verbs: ["get", "list", "watch"]
  # Enable this rule only if planing to use kubernetes_secrets provider
  #- apiGroups: [""]
  #  resources:
  #  - secrets
  #  verbs: ["get"]
  - apiGroups: ["extensions"]
    resources:
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources:
      - statefulsets
      - deployments
      - replicasets
      - daemonsets
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - ""
    resources:
      - nodes/stats
    verbs:
      - get
  - apiGroups: [ "batch" ]
    resources:
      - jobs
      - cronjobs
    verbs: [ "get", "list", "watch" ]
  # Needed for apiserver
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
  # Needed for cloudbeat
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources:
      - clusterrolebindings
      - clusterroles
      - rolebindings
      - roles
    verbs: ["get", "list", "watch"]
  # Needed for cloudbeat
  - apiGroups: ["policy"]
    resources:
      - podsecuritypolicies
    verbs: ["get", "list", "watch"]
  - apiGroups: [ "storage.k8s.io" ]
    resources:
      - storageclasses
    verbs: [ "get", "list", "watch" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
  # Should be the namespace where elastic-agent is running
  namespace: default
  labels:
    app: eck
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent-kubeadm-config
  namespace: default
  labels:
    app: eck
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: default
  labels:
    app: eck
lduvnjak commented 8 months ago

The fix was to manually delete the lease because it hangs:

# kubectl get lease
NAME                           HOLDER                                                      AGE
elastic-agent-cluster-leader   elastic-agent-leader-ca07e9f2-c874-437e-afd2-861aee7fec9d   40m

Not sure what exactly made it hang but at least it's a quick fix. Be careful because even if you completely delete the agents, even when using the CRD kubectl delete agent elastic-agent it doesn't get removed