Closed loneil closed 8 months ago
@WadeBarnes if the above is ok we can use these endpoints.
Not sure if there's additional uptime stuff through Crunchy for DB we would want to use or anything, or if that's covered in other monitoring. Maybe a question for @i5okie
I've added the following checks to our https://ditp.uptime.vonx.io/ and https://ditp.sla.vonx.io/ dashboards:
The proxies and agents are covered by using the proxy endpoint to check the ready
endpoint for the agent. We only had enough licenses to add the checks for the sandbox
and prod
UIs after that. I've requested licenses for more checks.
For now the UI checks test to see if the UI responds with a 200 code. That will do for now. If we start running into specific types of UI availability issues we can start getting more fancy.
Up/Down notifications go to me and the ditp-uptime
and ditp-uptime-prod
RC channels.
I'm calling this done for now.
Great, thanks @WadeBarnes !
As a Traction operator, I want to know which endpoints to query for things like uptime monitors, so that I can be alerted if a piece of the architecture is not reachable from outside traffic.
Three parts IMO to monitor, ACA-Py, the Traction NGINX proxy, and the Tenant UI
ACA-Py admin API Live
Ready
Traction Proxy The Traction proxy just forwards all requests to Traction, but using the hidden api key where appropriate (not relevant to the open live and ready), so really can just check the same ACA-py endpoints at the different URL to see if the proxy is up I think Live
Ready
Tenant UI The live and ready checks on OCP are just at
/
here right now. We could add separate/ready
or something in the Node app but for now would just use below as they are live and don't need a change and promotion through envs.