Open vmadman opened 6 months ago
I setup Slack notifications for endpoint status. When an endpoint goes down it will report to the #gcp-monitoring channel. When the endpoint comes back online it will post another message stating the endpoint is healthy again.
This is all being done through a custom Slack application named PV-Upptime-Notifications. This custom application contains a webhook that the Git workflow will use to send the notifications. The webhook is stored within a Github secret.
I still need to do testing on this functionality.
After some testing with themes I believe the 'light' theme seems to match the main site the most. This can be modified at a later time if needed.
I have updated the logo on the page so that should be reflected now.
Added custom domain for the status page. It can be accessed via: https://status.puravidabitcoin.io/
If we want to report out to Twitter we will need to create a custom solution as support for Twitter does not currently exist with Upptime.
Notes after initial setup: Looks great :slightly_smiling_face: Awesome work. I recognize it's a first test and you probably already know to do some of this, but, just in case, here's the todo list from my point-of-view: These items are internal resources:
I'm ok exposing internal status information to the public, kinda, if we obscure the piss out of it. e.g. "ArgoCD" -> "Infrastructure Controller" (with zero discoverable info about what, exactly, that is querying) For now, though, I can't really imagine the value of exposing those resources to the public, so let's just drop them from here, for now. We probably need to create a second status website with more fine-grained details, but it's not obvious to me how we'd make that private. I also think that we could add more sophisticated checks, probably, like... just for example... we could create a job that sends a very very small amount of money from one internal account to another, and/or back, over our internal transaction network (Galoy, via GraphQL queries) and then verifies that the transaction shows up for both sides. Those sorts of smoke/canary tests would tell us sooo much about our status... but, they're hard, I know, and they would increase the need for us to "close-source" the Upptime config. We might also build in Google Cloud Logs or Google Cloud Analytics queries or Honeycomb Error rates, etc, etc. The sky is the limit, really.
As for the names of the things we should keep:
Other stuff:
Again, though, this is excellent stuff. Thank you for the work so far.