We should indicate to load balancers or orchestrators whether a component is ready to receive traffic, and whether it's in a good state.
What is already there? What do you see now?
Not much. The current gRPC and HTTP servers just start when the component starts, but the actual services may not (yet) be ready to serve traffic. There is also no way to detect bugs (deadlocks).
What is missing? What do you want to see?
We should at least have an endpoint that indicates readiness, so that a load balancer can start directing traffic to an instance. The liveness would be nice if we could reliably detect deadlocks, but not essential.
How do you propose to implement this?
I think we should use gocloud.dev/health for this. It's not a lot of code, but we already vendor that module, so it's already there.
What can you do yourself and what do you need help with?
I can do this, but if someone else wants to take this, that's fine too.
Summary:
We should add endpoints for health probes. See also what kubernetes writes about these.
Why do we need this?
We should indicate to load balancers or orchestrators whether a component is ready to receive traffic, and whether it's in a good state.
What is already there? What do you see now?
Not much. The current gRPC and HTTP servers just start when the component starts, but the actual services may not (yet) be ready to serve traffic. There is also no way to detect bugs (deadlocks).
What is missing? What do you want to see?
We should at least have an endpoint that indicates readiness, so that a load balancer can start directing traffic to an instance. The liveness would be nice if we could reliably detect deadlocks, but not essential.
How do you propose to implement this?
I think we should use gocloud.dev/health for this. It's not a lot of code, but we already vendor that module, so it's already there.
What can you do yourself and what do you need help with?
I can do this, but if someone else wants to take this, that's fine too.
Original issue: https://github.com/TheThingsIndustries/lorawan-stack/issues/1232 by @htdvisser