gardener / etcd-backup-restore

Collection of components to backup and restore the etcd of a Kubernetes cluster.
Apache License 2.0
287 stars 100 forks source link

[Enhancement] Add capability for operators to monitor etcd data #597

Open unmarshall opened 1 year ago

unmarshall commented 1 year ago

Enhancement (What you would like to be added): There is a need to get insights into data that it stores in the DB (bbolt-DB). This provides valuable information on which resource type has the most keys and size. @istvanballok recently executed the following command to get that data out of etcd:

apk add jq util-linux
etcdctl --insecure-skip-tls-verify --cert /var/etcd/ssl/client/server/tls.crt --key /var/etcd/ssl/client/server/tls.key --cacert /var/etcd/ssl/client/ca/bundle.crt get --prefix / -w json | jq '.kvs[] | {key: .key | @base64d, valueLength: .value | length} | "\(.key | sub("/[^/]+/((?<type>[^/.]+)/.*|[^/]+/(?<customtype>[^/]+)/.*)";"\(.type  // .customtype)")) \(.valueLength)"' -r | awk '{sum[$1]+=$2; count[$1]++} END{for (key in sum) {printf "%s %s %s\n", sum[key], count[key], key}}' | sort -rn | column -t

Example output:

34156612  291   shootstates
17002464  7271  meteringreports
9932816   2592  secrets
5786756   476   shoots
3438780   38    cloudprofiles

It would be beneficial for the operators/devs to get easy access to this data either on demand or as custom metrics that are exposed to prometheus.

NOTE: The above is just one set of information. We should identify additional information/custom-metrics that is not available out-of-the-box from etcd over time.

Motivation (Why is this needed?): Use cases:

Approach/Hint to the implement solution (optional):

unmarshall commented 1 year ago

Apart from the above mentioned metrics, additional requirements post discussion with @istvanballok

shreyas-s-rao commented 1 year ago

/assign @abdasgupta