Aiven-Open / prometheus-exporter-plugin-for-opensearch

Prometheus exporter plugin for OpenSearch & OpenSearch Mixin
Apache License 2.0
123 stars 37 forks source link

Snapshot (management) metrics #165

Open ginkel opened 1 year ago

ginkel commented 1 year ago

Hi there,

we were wondering whether it would make sense to extend the prometheus-exporter-plugin-for-opensearch in such a way that it exports additional metrics about which snapshots have been created, when the last snapshot has been created and so on. The main use-case would be to monitor whether backups are created in a regular fashion (using Snapshot Management), so that disruptions of the snapshot creation can be detected early on using alerts.

Do you think that would make a worthwhile addition to the plugin?

Thanks, Thilo

lukas-vlcek commented 1 year ago

Is this metric exposed by OpenSearch itself? If yes then adding it to Prom. exporter would be an easy task. Or are there at least some relevant metrics already exposed by OpenSearch?

ginkel commented 1 year ago

One could retrieve the registered repositories using a GetRepositoriesRequest and then obtain details about each snapshot using GetSnapshotsRequest. Exposing a time series per snapshot could be tricky (metrics inflation), so one could limit the number of observed snapshots to the n latest. If a snapshot has been created by a Snapshot Management Policy this is indicated using the sm_policy metadata attribute, which one could group the metrics by (to just expose the metrics of the last snapshot created by the policy).

In the REST API this maps to:

GET _snapshot

GET _snapshot/<repo_name>/_all
sandervandegeijn commented 1 year ago

Agreed, would be nice to have :)

patelsmit32123 commented 2 months ago

@lukas-vlcek I would like to take this up, we are implementing something similar in our forked repo, so we can contribute back the same. Please let me know if we still plan to add snapshot related metrics

lukas-vlcek commented 2 months ago

@patelsmit32123 I would love to take a look at any PR :-)

patelsmit32123 commented 2 months ago

@lukas-vlcek PTAL at #295, I have tested them on our staging env, seems to be working fine.