We are working on the synthetics project and using the elastic agent to gather all the needed metrics from the service and the cluster. We are on a GKE managed cluster and we are using this configuration for the elastic agent:
The elastic agent is working fine, reporting cluster metrics every 30s, till we have an error on the k8s API server communication:
Apr 24, 2022 @ 06:11:43.159 | E0424 06:11:43.159693 8 leaderelection.go:325] error retrieving resource lock kube-system/elastic-agent-cluster-leader: Get "https://10.253.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/elastic-agent-cluster-leader": context deadline exceeded
Apr 24, 2022 @ 06:11:43.159 | I0424 06:11:43.159726 8 leaderelection.go:278] failed to renew lease kube-system/elastic-agent-cluster-leader: timed out waiting for the condition
Apr 24, 2022 @ 06:11:43.159 | E0424 06:11:43.159762 8 leaderelection.go:301] Failed to release lock: resource name may not be empty
After that the lease keep expired until I restart the former leader pod.
The problem with the API server affects at the same time to the cert-manager deployment that we have in the same cluster but, the cert-manager deployment recover the leader lease automatically. This is the behavior that we expected from the elastic agent.
As a workaround, we are trying to upgrade the agent and set up a different deployment for the cluster metrics, as mentioned here.
We are working on the synthetics project and using the elastic agent to gather all the needed metrics from the service and the cluster. We are on a GKE managed cluster and we are using this configuration for the elastic agent:
The elastic agent is working fine, reporting cluster metrics every 30s, till we have an error on the k8s API server communication:
After that the lease keep expired until I restart the former leader pod.
The problem with the API server affects at the same time to the cert-manager deployment that we have in the same cluster but, the cert-manager deployment recover the leader lease automatically. This is the behavior that we expected from the elastic agent.
As a workaround, we are trying to upgrade the agent and set up a different deployment for the cluster metrics, as mentioned here.