walterddr commented 2 years ago

Currently BaseServerStarter will register status health check callback with all the enabled table on that server. It will only return health OK once all table/segments are loaded.

This will cause problem for server with large amount of table/data to be loaded, running in a container environment.

container manager will check health endpoint for service monitoring, while startup, pinot-server returns 404 for a while.
container manager starts killing the server container and restarting it. pinot server never gets a chance to load all segments.
infinite loop again.

Proposal

keep current /health API behavior.
new endpoint /health/liveness for liveness
new endpoint /health/readiness for readiness

Reference

[OLD PROPOSAL]

new /health/instance API that strip table/segment status callback from server health check
- only instance health check, for container manager
new /health/services API that returns a similar to PinotServiceManagerHealthCheck REST API that returns a map of serviceStatusCallback value
- instead of returning one single back state, it will return all MultiServiceStatusCheckCallback in a map indicating which is the slowest to become healthy

[OLDER PROPOSAL] propose to strip table/segment status callback from server health check endpoint and only check server's health. and create /health/services/ endpoint to check server + table resource health and only returns OK if everything is healthy

mcvsubbu commented 2 years ago

We need this check in our environment. Please do not remove it.

Instead, you can use this config variable (set it to 0) to disable it altogether.

https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java#L370

walterddr commented 2 years ago

ok yes that would be changing the semantics of a REST API.

counter proposal

keep current /health API behavior.
new /health/instance API that strip table/segment status callback from server health check
- only instance health check, for container manager
new /health/services API that returns a similar to PinotServiceManagerHealthCheck REST API that returns a map of serviceStatusCallback value
- instead of returning one single back state, it will return all MultiServiceStatusCheckCallback in a map indicating which is the slowest to become healthy

walterddr commented 2 years ago

@mcvsubbu @mayankshriv @npawar ^

mcvsubbu commented 2 years ago

The existing behavior of loading all segments was put in place to allow for rolling restarts and upgrades. With this behavior, we can use healthcheck after an upgrade or restart and rest assured that we don't compromise on availability. @walterddr how do you propose to solve this problem in the container environment?

walterddr commented 2 years ago

I think the issue I am trying to solve here is orthogonal to the rolling restart problem, which is an intended restart. I am happy to create another issue to address that for the containerized environment.

this issue mainly discusses the unintentional restart triggered by the containerized environment. Here I propose to create additional endpoints to retrieve "instance availability" and "data availability" separately. It shouldn't affect how the rolling restart issue being addressed, containerized or not.

jadami10 commented 1 year ago

We have the same problem where servers take too long to go healthy and the ASG recycles them forever. This is for intentional restarts as well as unintentional ones. Rewording what you said to see if we agrees; there does seem to be some issue here where the /health endpoint is doing too much:

The infrastructure responsible for making sure the instance or container is healthy really just wants to know, "are you working correctly?". Which really as long as the server is up and doing something should return 200.
The infrastructure used to coordinate rolling restarts wants to know, "are you up and have loaded all your segments".

I'm +1 for your counter proposal. We can use /health/instance for whatever instance/container manager is being used and leave /health as is for backwards compatibility.

mcvsubbu commented 1 year ago

+1 for new endpoint, yes

Jackie-Jiang commented 1 year ago

I'd suggest adding a new API with more information in the response:

Status (NOT_STARTED, STARTING, GOOD, BAD, SHUTTING_DOWN)
Custom map explaining the status (segments not loaded yet)

It is much more flexible this way, and different client can choose how to parse the response. E.g. liveness check can pass when the server is not BAD, readiness check can pass when server is GOOD

jadami10 commented 1 year ago

One concern I have with a more complex API is most load balancers only allow configuring the path and then look at the status code of the response.

So having both /health/instance and /health/services like in the proposal makes most sense to me.

Jackie-Jiang commented 1 year ago

One concern I have with a more complex API is most load balancers only allow configuring the path and then look at the status code of the response.

So having both /health/instance and /health/services like in the proposal makes most sense to me.

@jadami10 What if we return different status code for different status? Current API only supports 200 and 503, which cannot represent all the available status. One problem is that different client might have different logic on parsing the pinot service status, and fixing the return type can reduce the flexibility. We don't want to add an extra new API for each different status check logic.

jadami10 commented 1 year ago

I don't think your typical ELB parses that way. Any non-200 code is considered failing. And in this case we want a server that's up but downloading segments/catching up to be a 200 on one endpoint and non-200 on another.

@walterddr, do you plan to add a healthcheck metrics for the server while you're in here

HEALTHCHECK_OK_CALLS("healthcheck", true),
HEALTHCHECK_BAD_CALLS("healthcheck", true),

similar what broker and controller have?

walterddr commented 1 year ago

ENUM type in Response vs. Different end points

IMO it is best to use different endpoints instead of the same endpoint with varying types of response

HTTP Response Code does suggest that any 200-299 is considered a success, we can potentially encode that along with the different acceptable success stats. But it would be a very non-traditional way of using the HTTP response code.
Container Management in K8s provides the liveness/readiness/startup probes specifically to differentiate different type of condition of the container pod, but I am not seeing any customization to allow parsing of the response content.

Some Thoughts

Initially, this issue is created to deal with running Pinot on K8s, while debugging a slow pod restart failure. thus I added the health/instance for liveness probe; and use the normal health for readiness probe.
I've also discovered that when the slow pod restart returns a 404 (BAD) state. It doesn't tell me exactly which segment was slow loading for which table. thus I added the additional endpoint health/service to return the entire service status callback map.

Seems like it is causing some confusion and addressing both issues at the same time also distract the discussion. So I would say let's focus on the main purpose of the discussion to find a way to allow containerization software to know whether a pinot server is alive vs. a pinot server is ready for serving queries.

if there's a way to easily configure k8s or ELB to differentiate the 2 server state. I am ok with using just one endpoint.

Jackie-Jiang commented 1 year ago

Checked other systems, and seems health check is mostly for liveness instead of readiness. Since we already use it as the readiness check, we can leave it as is for backward compatibility.

Based on the requirement, I'd suggest making the api more explicit on the purpose:

/health/liveness for liveness
/health/readiness for readiness

I feel using instance and service is not as clear

walterddr commented 1 year ago

sounds good. I will create these. ^ @jadami10 any follow up?

jadami10 commented 1 year ago

nope, those look great

apache / pinot

Server HealthCheck StatusCallback should separate instance and table/segment health check #9023

Proposal

Reference

ENUM type in Response vs. Different end points

Some Thoughts