Blockstream / greenlight

Build apps using self-custodial lightning nodes in the cloud
https://blockstream.github.io/greenlight/getting-started/
MIT License
109 stars 27 forks source link

Implement a health check API to track the status of Greenlight for better error handling. #477

Open Nodirbek75 opened 2 months ago

Nodirbek75 commented 2 months ago

We are currently using Breez-SDK in our app, which includes a health check API to track the status of the Breez server. However, it doesn't include a Greenlight health check. It would be great to add a health check API for tracking the Greenlight status so that we can implement better error handling in our app.

@cdecker: https://discord.com/channels/899980449231814676/900323634512551946/1193584731900612789

cdecker commented 2 months ago

Thanks for reporting this issue, giving it a place to be discussed. This has been requested by several users, and we'd love to provide a status API, however it is not that easy to boil the system status down to a single red or green bubble. Take for example the scalability dimension we chose: nodes. If a single node is having issues, does the system as a whole count as (partially) unavailable, or is it operating as expected? I'd argue it is still within normal operations, but the user whose node is having issues (e.g., missing a signer, failing a payment, failing to reconnect to a peer), the system looks unavailable and unstable.

We will of course add statuses for the core services such as the tower and the scheduler, but those have 99.9+% uptime as we speak, and mostly it's the fleet of user nodes encountering issues individually, which we then fix up asap, but declaring the entire system unavailable because a small subset of users is having a sub-optimal time doesn't help others, as the experiences are rather subjective due to the separation between tenants on our system.

TL;DR: we need to define the semantics of available / unavailable for the nodes before we can provide a useful status indication for GL as a whole.