NOAA-GSL / MATS

MATS is a quick & interactive way to view verification statistics
https://gsl.noaa.gov/mats/
6 stars 0 forks source link

Add Metadata update time to the healthcheck #592

Closed ian-noaa closed 7 months ago

ian-noaa commented 3 years ago

Add the time the metadata for each app has been last updated to the health check endpoint so that we can scrape it with Prometheus.

randytpierce commented 3 years ago

From Randy... I was thinking that we could incorporate "metaDataTableUpdates.find({}).fetch()" which has the data about when the last refresh for each metadata table was accomplished. That data only gets updated when the actual metadata tables are newer than the mongo metadata, in other words whenever the metadata needed to be updated, which would reflect the last time the metadata successfully ran. If we returned that data the caller routine could compare the update time minus the schedule interval of running the actual metadata updates updates, or it could just have an upper tolerance time, like a day, and if the last update was older than that set time it could raise an alert of some kind.

The advantage of this is that it takes the processing away from the server, allowing the server to be simpler and not require services like sendmail, which require administrative action. All the processing is on some external client. This is probably better for container operation.

Adding the "metaDataTableUpdates.find({}).fetch()" to the status server side routes is like a five minute code effort, we may as well just include it in the next build. UPDATE... We will need the metadata scripts themselves to update a table with the completion status, and then the middleware for the status check would query that table to get the update times. A couple of questions remain. 1) should any given app return all the metadata update information or only information about metadata that it uses? I sort of think it should return all of it actually. 2) should we load this data automatically each time the metadata refreshes in an app, into the apps mongo data?

ian-noaa commented 3 years ago

If we returned that data the caller routine could compare the update time minus the schedule interval of running the actual metadata updates, or it could just have an upper tolerance time, like a day, and if the last update was older than that set time it could raise an alert of some kind.

I believe Prometheus can do this if we give it the timestamp the metadata last ran as a metric. E.g. - if we had a metadata_last_run metric we could set an alert to trigger when it's been 48 hours since that time with an alert statement like: time() - metadata_last_run < 60 * 60 * 48

In response to your questions:

  1. What sort of information can we surface about the metadata update? I'm certainly not opposed to returning more information on the metadata updates if it's useful for monitoring.
  2. What sort of advantages would we get by loading the monitoring info in the app's mongo db?

Reading up on Prometheus more, I think the recommended way to go about this is to instrument the code itself with the Prometheus Client: https://github.com/siimon/prom-client which is a bit more complicated than I was originally thinking. I do still think exposing Prometheus metrics would be useful since so many Kubernetes components also expose them.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity.

bonnystrong commented 1 year ago

Reviewed 11/23/22. Want to add a metadata update time to prometheus.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity.

mollybsmith-noaa commented 7 months ago

No longer relevant