gajus / lightship

Abstracts readiness, liveness and startup checks and graceful shutdown of Node.js services running in Kubernetes.
Other
515 stars 33 forks source link

signalNotReady() not having any affect in shutdownHandler #25

Closed cjroebuck closed 3 years ago

cjroebuck commented 3 years ago

Hey there, I'm having a hard time understanding why when I call lightship.signalNotReady() from within a shutdown handler, I get the log server is already shutting down. That might be correct but the problem is that requests to the /ready endpoint are still returning 200 OK, meaning my container is still accepting requests when it should be shutting down.

Here's some sample code:

lightship.registerShutdownHandler(async () => {
  logger.info(
    "in lightship shutdown handler.."
  );
  lightship.signalNotReady();
  // this is when I see the log 'server already shutting down', yet the server is still 'ready' when it shouldn't be..
});

If this is normal behaviour, then my question becomes - how do I tell lightship to signal not ready after I receive a sigterm, using lightship's shutdown handler callback?

gajus commented 3 years ago

Hm. I cannot think of a reason this is the case, but it sure seems deliberate because there is even a test for it.

cjroebuck commented 3 years ago

Yeah, strange, I need to set my container not ready as soon as I get a sigterm so that nginx ingress doesn’t keep sending traffic to it. Would you accept a pr for this?

On Sat, 10 Oct 2020 at 01:42, Gajus Kuizinas notifications@github.com wrote:

Hm. I cannot think of a reason this is the case, but it sure seems deliberate because there is even a test for it.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gajus/lightship/issues/25#issuecomment-706456487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC7MC5IDXETIYF7DKDHDTSJ6UVXANCNFSM4SKRRNKA .

gajus commented 3 years ago

See:

https://github.com/gajus/lightship/issues/12

Also:

Googling for solutions to this problem makes it seem that adding a readiness probe to your pod solves this problem. Supposedly, all you need to do is make the readiness probe start failing as soon as the pod receives the SIGTERM. This is supposed to cause the pod to be removed as the endpoint of the service. The removal would happen only after the readiness probe fails for a few consecutive times (this is configurable in the readiness probe spec). And, obviously, the removal then still needs to reach the Kube-Proxy before the pod is removed from iptables rules.

In reality, the readiness probe has absolutely no bearing on the whole process at all. The Endpoints controller removes the pod from the service endpoints as soon as it receives notice of the pod being deleted (when the deletionTimestamp field in the pod’s spec is no longer null). From that point on, the result of the readiness probe is irrelevant.

https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/

cjroebuck commented 3 years ago

Hi Gajus,

Yeah I understand that however I believe nginx ingress doesn't use endpoints and sends traffic directly to pod IP's, so readiness probe might still be relevant for that.

Will see if your fix makes any difference, if not it could be something to do with linkerd which I'm also using as service mesh..

Thanks

On Sat, 10 Oct 2020 at 16:28, Gajus Kuizinas notifications@github.com wrote:

See:

12 https://github.com/gajus/lightship/issues/12

Also:

Googling for solutions to this problem makes it seem that adding a readiness probe to your pod solves this problem. Supposedly, all you need to do is make the readiness probe start failing as soon as the pod receives the SIGTERM. This is supposed to cause the pod to be removed as the endpoint of the service. The removal would happen only after the readiness probe fails for a few consecutive times (this is configurable in the readiness probe spec). And, obviously, the removal then still needs to reach the Kube-Proxy before the pod is removed from iptables rules.

In reality, the readiness probe has absolutely no bearing on the whole process at all. The Endpoints controller removes the pod from the service endpoints as soon as it receives notice of the pod being deleted (when the deletionTimestamp field in the pod’s spec is no longer null). From that point on, the result of the readiness probe is irrelevant.

https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gajus/lightship/issues/25#issuecomment-706566287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC7MAJOTQXF6RXKARC3ULSKB4QTANCNFSM4SKRRNKA .

gajus commented 3 years ago

Now that I have better context, I am still not 100% what is the correct way to handle this.

However, here is where I see possible confusion. The article is talking about receiving external SIGTERM. In makes sense that readiness does not need to change in that case since Kubernetes controller already knows to remove the container.

However, we are talking about the application itself committing shutdown(). It makes sense to set not ready in this case because controller is not aware of it.

cjroebuck commented 3 years ago

Well either way, it would be handy if lightship allowed us to set not ready even when the server is already shutting down.

gajus commented 3 years ago

:tada: This issue has been resolved in version 6.1.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket: