geopython / GeoHealthCheck

Service Status and QoS Checker for OGC Web Services
https://geohealthcheck.org
MIT License
84 stars 71 forks source link

monitoring GeoHealthCheck itself #247

Open justb4 opened 5 years ago

justb4 commented 5 years ago

So who monitors the monitor?

In production environments a generic HTTP (Keyword) monitor or even another GHC instance may be used to monitor the uptime and availability of a running GHC (Webapp).

Problem

In some cases where GHC runs with a separate GHC Runner instance (that continuously runs the healthchecks via APScheduler), that process may somehow die (or maybe even running but stuck somewhere). The GHC Webapp could still be up. There is currently no way to find out whether GHC Runner is actually running other than detailed process inspection (ps, docker ps, Prometheus etc). But we would like a simple HTTP-based status check via the GHC Webapp for the Runner, that it is active.

Possible Solution

Provide a GHC API status service that reports recent activity of GHC Runner. For example the number of runs in the last N minutes. This is easily realized with a query. (Easier than inspecting external processes/Docker containers). GHC or any HTTP-uptime checker could check for a keyword like runs: 0 meaning no runs in last N minutes. This indicates that GHC Runner is not running (or maybe even running but stuck somewhere).

justb4 commented 5 years ago

Test comment to see if Gitter webhook works...