Apply Service Unavailable globally if healthcheck fails

dotnet-architecture / HealthChecks

Experimental Health Checks for building services, such as with ASP.NET Core

Other

453 stars 124 forks source link

Apply Service Unavailable globally if healthcheck fails #60

Open herecydev opened 7 years ago

herecydev commented 7 years ago

If I understand this line correctly, the server will return a HTTP 503 if the path/port matches what has been registered. However if it doesn't then it will invoke the next middleware regardless of health. This seems odd, I would expect the server to return 503s if the healthchecks have failed. In my case, despite the "unhealthy" status, the pipeline continues and the controller inevitably fails for the same reasons as the health checks.

I think that the middleware should issue HTTP 503 in this case, or at least let this be configurable. More than happy to give a PR, if this feels sensible?

ycrumeyrolle commented 7 years ago

The responsibility of this middleware is to expose in a very succinct form the Health if the application. This can be used by load balancer in order to add or remove a node. This middleware do not have the responsibility to expose the full details of the application health or control the application behaviour. For example, you may have a check that may fail, but as long as the load balancer decide to transmit all the requests to other nodes, the application on the current node may be still be able to respond to most of the requests.

This is similar to the canary in mines. The dead canary does not means "you are dead" but means "it is time to go away, or you will die"

herecydev commented 7 years ago

Sure, that's pretty centered around having a loadbalancer, or another proxy "up-front" that can interpret health and smartly route to other nodes. There are scenarios where this isn't applicable and it makes sense to block all requests early instead of failing, where it knows that the health checks have failed.

ycrumeyrolle commented 7 years ago

This may be based on a Boolean option (CatchAllRequestsOnFailure) or based on a policy, like only for critical tagged checks.

herecydev commented 7 years ago

Yer I think that's appropriate. A bool to turn on/off globally, perhaps a delegate that can be registered and invoked for specific routes or whatever the user decides is permissible.