eclipse / microprofile-health

microprofile-health
Apache License 2.0
106 stars 61 forks source link

Idea: Add `optional` flag to health check response #317

Closed zivkovic closed 2 years ago

zivkovic commented 2 years ago

Description Add a flag optional to HealthCheckResponse. Default value should be false. If value is true, display the health check, but do not count it's status towards the main service status. Meaning, if any optional health check status is DOWN, the main status would still be UP, unless some mandatory health check is DOWN.

Use cases This would only be used for health checks on services that are not mandatory for the service to run, ex: optional connection to microservice that handles statistics. If statistics goes down (or network issues prevent our service to connect to statistics), the service can continue to work, but we get info about it being down and can debug/fix it easier and faster.

If required, I can provide a pull request with this functionality.

xstefank commented 2 years ago

Hi, interesting idea. I am a little skeptical if this is really a use case for health check invocations. You would like to have a periodical check in service A that requires for it to run properly that it can connect to the statistics service so assuming that you would implement a Readiness probe to prohibit traffic to your service A when the connection to statistics service cannot be established, why would you continue to process requests? In other words, do you need this health check that checks the statistics server at all if your service A does not depend on it being present?

zivkovic commented 2 years ago

Well, the idea is that the service can still work (with stale data or a fallback call). If this case occurs, we would be able to detect it in a health check and then try to restore normal working order. Even if service B is offline, service A can still function, but it would not serve the "real" data. Both ways would still serve valid data. That's why this case would be useful to us, and maybe to someone else?

The reasoning why we would continue to process requests even if statistics is down, is that we would rather have a half working environment than a fully dead one. Statistics are not required for proper working service, it just gathers stats. Yes, the stats would be skewed, but still. Users would not notice a difference in their use cases, the user experience remains the same.

xstefank commented 2 years ago

So health check types have precisely defined use cases in https://github.com/eclipse/microprofile-health/blob/master/spec/src/main/asciidoc/java-api.asciidoc#different-kinds-of-health-checks. For instance, readiness is defined as a Health Check that allows third party services to know if the application is ready to process requests or not. So in this sense, I don't see how optional checks can be included in such use cases. You shouldn't include health check procedures in a type that is not part of its use case because that would be just prolonging required health invocations that are fired repeatedly by the third party (k8s).

However, I see that you would like to execute some operations with health invocations that would not have an impact on the defined groups of health checks (readiness, liveness, startup) and thus not have an impact on the running application. In our implementation, we have an experimental feature Health groups which is I think exactly what you need. If you can check it and provide feedback, we would gladly include it in the specification.

zivkovic commented 2 years ago

Yeah, you are correct. If I would define a health group for this use case it would cover it. The only downside to this is that the main health check call "/health" would always return "DOWN", because it always lists all available health checks. I guess this would work, since the "/health/readiness" and "/health/liveness" would work as they do currently.

Thank you.