Add state_lease data_stream for kubernetes integration

Background context

One basic SLI/SLO for Kubernetes clusters should be on Leadership Switch.

Kubernetes native components including cluster-autoscaler, kube-controller-manager, and kube-scheduler are using leader-with-lease in client-go.

As Kubernetes operators we would like to monitor:

(SLI_a) Leaderless: when there is no leader (need to define the SLI which would mean a critical error)
(SLI_b) Time of leadership switch (need to define the SLI which would be a warning)

Being with no leader for a period of time is quite critical for a production Kubernetes cluster. Hence we need to define proper SLIs/SLOs based on these observations.

This information can be retrieved by the kube-state-metrics Service and look like the following:

# HELP kube_lease_owner Information about the Lease's owner.
# TYPE kube_lease_owner gauge
kube_lease_owner{lease="kind-control-plane",owner_kind="Node",owner_name="kind-control-plane"} 1
# HELP kube_lease_renew_time Kube lease renew time.
# TYPE kube_lease_renew_time gauge
kube_lease_renew_time{lease="kind-control-plane"} 1.676268601e+09

SLO_a: kube_lease_owner should not be equal to zero for more than 30 seconds. That should indicate a CRITICAL error. SLO_b: avg(kube_lease_renew_time) should not be greater than 0.5s for a period of last 10 mins. That should indicate a WARNING.

At the moment we don't have a specific metricset/data_stream that specifically collects this information from kube-state-metrics. Hence the goal of this issues is the following:

TODOs

create the metricset/data_stream that specifically collects the lease information from kube-state-metrics
provide some basic Watchers/Alerts similarly to https://github.com/elastic/integrations/issues/4997

FYI @gizas @rameshelastic @mlunadia

elastic / integrations

Add state_lease data_stream for kubernetes integration #5363

Background context

TODOs