Closed nderwin closed 3 years ago
@nderwin so sorry I didn't get to smallrye/smallrye-health#161 sooner. I will check it asap. But the specification is correct. 500 is reserved for unexpected processing errors that are not able to be handled in the health implementation (e.g., OOME) -- that is for responses that can't put the JSON response together. I don't think we have TCK test that throws RE. Can you point me to what you mean?
If you agree, can we close this issue and continue in sr-health please as this is specific to the implementation.
@xstefank Maybe I was confusing the TCK test code between this project and smallrye-health, as I can't find it now in the TCK tests. I did find one in the smallrye test code (https://github.com/smallrye/smallrye-health/blob/main/implementation/src/test/java/io/smallrye/health/SmallRyeHealthReporterTest.java#L46), but the tests there are looking for a 503 response code. If I'm understanding correctly, the above test should verify a 500 response code.
As for this project, does the TCK need to be updated to add tests for this situation? If not, I guess we can close this issue.
@nderwin yes, that test is specific to SR. Please see my response in the SR issue. I believe we shouldn't kill the application if health checks throw exceptions. But @pgunapal how do you handle it in OL please?
@xstefank In OL, whenever we can't construct the correct JSON response to return a 200 or a 503 or unable to process the health checks in the application, for example, an exception is thrown (RuntimeException), we catch the exception and return a 500 response code. I just read the specification and it seems to be confusing. In Appendix B of the protocol-wireformat, it says the Status should be Undetermined
for HTTP Status 500, but in the follow on example for Status 500, we see that the status is DOWN for a Response code of 500. The comment also says Request process failed (i.e. error in procedure)
, which would mean if we can not process the health checks (regardless, of any occurred exceptions (including RuntimeExceptions) or unable to return a correct JSON response with code 200/503), it should return 500, with an Undetermined status. I think the example should be fixed to reflect the Appendix B table to clarify it, WDYT?
@pgunapal yes, that will be only a typo in example. But I don't think that caught exception was ever meant to return 500. We should be aligned and always return 200/503 if we can construct JSON response. Please check also my response in the linked SR issue which was accepted by the user.
Yes, so I guess that example was written by the same person that implemented this in SR. So I think we should be returning 503 if we can catch the exception to stay aligned with our 200/503 response. 500 will be reserved for generic errors that cannot be processed by the implementation, e.g. OOME. I believe that in such cases HTTP server generally responds with 500 (internal server error).
@xstefank Alright, I'll go ahead and close this one, however, the spec might need a little more clarification to note that the documented 500 response is not intended to be produced by an MP implementation, rather it's a consequence of the application being health checked being not available to serve any meaningful response.
@nderwin in every case, thank you for bringing this up in the spec. This is really unfortunate overlook from our side which needs clarification.
@xstefank I agree, the spec would need clarification on the 500.
While investigating https://github.com/smallrye/smallrye-health/issues/161, and comparing to the spec (https://github.com/eclipse/microprofile-health/blob/master/spec/src/main/asciidoc/protocol-wireformat.asciidoc#status-codes), it looks like if a (runtime) exception is thrown from a health check method, eventually the response should be a status of 500 instead of 503. In order to facilitate that, it also looks like a new enum value is needed for the status to represent the "undetermined" state.
There are already TCK tests throwing exceptions out of the health checks, but they are validating a response status of 503.