In production environments a generic HTTP (Keyword) monitor or even another GHC instance may be used to monitor the uptime and availability of a running GHC (Webapp).
Problem
In some cases where GHC runs with a separate GHC Runner instance (that continuously runs the healthchecks via APScheduler), that process may somehow die (or maybe even running but stuck somewhere). The GHC Webapp could still be up. There is currently no way to find out whether GHC Runner is actually running other than detailed process inspection (ps, docker ps, Prometheus etc).
But we would like a simple HTTP-based status check via the GHC Webapp for the Runner, that it is active.
Possible Solution
Provide a GHC API status service that reports recent activity of GHC Runner. For example the number of runs in the last N minutes. This is easily realized with a query. (Easier than inspecting external processes/Docker containers). GHC or any HTTP-uptime checker could check for a keyword like runs: 0 meaning no runs in last N minutes. This indicates that GHC Runner is not running (or maybe even running but stuck somewhere).
So who monitors the monitor?
In production environments a generic HTTP (Keyword) monitor or even another GHC instance may be used to monitor the uptime and availability of a running GHC (Webapp).
Problem
In some cases where GHC runs with a separate
GHC Runner
instance (that continuously runs the healthchecks via APScheduler), that process may somehow die (or maybe even running but stuck somewhere). TheGHC Webapp
could still be up. There is currently no way to find out whetherGHC Runner
is actually running other than detailed process inspection (ps, docker ps, Prometheus etc). But we would like a simple HTTP-based status check via the GHC Webapp for the Runner, that it is active.Possible Solution
Provide a GHC API
status
service that reports recent activity of GHC Runner. For example the number of runs in the last N minutes. This is easily realized with a query. (Easier than inspecting external processes/Docker containers). GHC or any HTTP-uptime checker could check for a keyword likeruns: 0
meaning no runs in last N minutes. This indicates that GHC Runner is not running (or maybe even running but stuck somewhere).