Closed ian-noaa closed 7 months ago
From Randy... I was thinking that we could incorporate "metaDataTableUpdates.find({}).fetch()" which has the data about when the last refresh for each metadata table was accomplished. That data only gets updated when the actual metadata tables are newer than the mongo metadata, in other words whenever the metadata needed to be updated, which would reflect the last time the metadata successfully ran. If we returned that data the caller routine could compare the update time minus the schedule interval of running the actual metadata updates updates, or it could just have an upper tolerance time, like a day, and if the last update was older than that set time it could raise an alert of some kind.
The advantage of this is that it takes the processing away from the server, allowing the server to be simpler and not require services like sendmail, which require administrative action. All the processing is on some external client. This is probably better for container operation.
Adding the "metaDataTableUpdates.find({}).fetch()" to the status server side routes is like a five minute code effort, we may as well just include it in the next build. UPDATE... We will need the metadata scripts themselves to update a table with the completion status, and then the middleware for the status check would query that table to get the update times. A couple of questions remain. 1) should any given app return all the metadata update information or only information about metadata that it uses? I sort of think it should return all of it actually. 2) should we load this data automatically each time the metadata refreshes in an app, into the apps mongo data?
If we returned that data the caller routine could compare the update time minus the schedule interval of running the actual metadata updates, or it could just have an upper tolerance time, like a day, and if the last update was older than that set time it could raise an alert of some kind.
I believe Prometheus can do this if we give it the timestamp the metadata last ran as a metric. E.g. - if we had a metadata_last_run
metric we could set an alert to trigger when it's been 48 hours since that time with an alert statement like: time() - metadata_last_run < 60 * 60 * 48
In response to your questions:
Reading up on Prometheus more, I think the recommended way to go about this is to instrument the code itself with the Prometheus Client: https://github.com/siimon/prom-client which is a bit more complicated than I was originally thinking. I do still think exposing Prometheus metrics would be useful since so many Kubernetes components also expose them.
This issue is stale because it has been open 90 days with no activity.
Reviewed 11/23/22. Want to add a metadata update time to prometheus.
This issue is stale because it has been open 90 days with no activity.
No longer relevant
Add the time the metadata for each app has been last updated to the health check endpoint so that we can scrape it with Prometheus.