Kong / kong-plugin-prometheus

Prometheus plugin for Kong - this plugin has been moved into https://github.com/Kong/kong, please open issues and PRs in that repo
Apache License 2.0
119 stars 57 forks source link

feat(handler) expose dataplane status on control plane #98

Closed fffonion closed 3 years ago

fffonion commented 4 years ago

This PR adds a series of metrics to expose connected Data Plane metrics on Control Plane side.

sample output

curl -s localhost:8001/metrics|grep data_plane
# HELP kong_data_plane_config_hash Config hash value of the data plane
# TYPE kong_data_plane_config_hash gauge
kong_data_plane_config_hash{node_id="d4e7584e-b2f2-415b-bb68-3b0936f1fde3",hostname="ubuntu-bionic",ip="127.0.0.1"} 1.7158931820287e+38
# HELP kong_data_plane_last_seen Last time data plane contacted control plane
# TYPE kong_data_plane_last_seen gauge
kong_data_plane_last_seen{node_id="d4e7584e-b2f2-415b-bb68-3b0936f1fde3",hostname="ubuntu-bionic",ip="127.0.0.1"} 1600190275
# HELP kong_data_plane_version_compatible Version compatible status of the data plane, 0 is incompatible
# TYPE kong_data_plane_version_compatible gauge
kong_data_plane_version_compatible{node_id="d4e7584e-b2f2-415b-bb68-3b0936f1fde3",hostname="ubuntu-bionic",ip="127.0.0.1",kong_version="2.4.1"} 1
fffonion commented 4 years ago

The config_hash metrics is useful for catching "DP has inconsistent configs across the cluster for x time".

But this will create a new metrics everytime the config is flipped, so the time series is not continous. Need to verify if that will cause trouble in alerting. For example I can imagine we have a count(kong_dataplane_last_seen) for expected data plane count.

hbagdi commented 4 years ago

But this will create a new metrics everytime the config is flipped, so the time series is not continous.

I'm not sure I understand. Why would it be so?

fffonion commented 4 years ago

But this will create a new metrics everytime the config is flipped, so the time series is not continous.

I'm not sure I understand. Why would it be so?

If we put config_hash as a lable into metrics, you will expect for example dataplane_last_seen{node_id="UUID", config_hash="hash1"} to exist before the flip, and dataplane_last_seen{node_id="UUID", config_hash="hash2"} exist after. There'll be two color lines in the prometheus graph. But they are actually referring to a same dataplane node.

So @wyndigo 's idea is to make the config_hash, which is a md5 hexstring, into its numeric value. Then it's no longer a label and we can still compare difference between DPs.