CircuitBreaker HALF_CLOSED state doesn't behave per documentation.

mbomeara commented 10 years ago

The HALF_CLOSED documentation states "... the CircuitBreaker will transition to a HALF_CLOSED state, where one call is allowed to go through as a test. If that call succeeds, the CircuitBreaker moves back to the CLOSED state; if it fails, it moves back to the OPEN state for another cooldown period". The implementation actually closes the CircuitBreaker if the attempt fails in the HALF_CLOSED state, thereby forcing the FailureInterpreter to start from scratch. This violates the one call promise if the CircuitBreaker is configured to have a limit > 0.

This code from CircuitBreaker appears to be at fault:

private void handleFailure(Throwable cause) throws Exception {
    if (failureInterpreter == null ||
        failureInterpreter.shouldTrip(cause)) {
            this.tripException = cause;
            trip();
    }

    if (isAttemptLive) {
        close(); // <--- shouldn't this be trip(); ?
    }

    if (cause instanceof Exception) {
        throw (Exception)cause;
    } else if (cause instanceof Error) {
        throw (Error)cause;
    } else {
        throw (RuntimeException)cause;
    }
}

joercampbell commented 10 years ago

An un-ignored exception if thrown is still a 'valid' request, which would cause the breaker to close. Do you have a unit test or an example of what you are mentioning here?

mbomeara commented 10 years ago

Looks like I mis-interpreted the documentation then. I was expecting a single request to be allowed through when in the HALF_CLOSED state.

Take the example of using a CircuitBreaker to wrap a client of a remote service that has failed completely. Once the FailureInterpreter decides that the CB should OPEN, I expected only one request after the "cool down" period to be allowed through. I had assumed that if that single request failed with an Exception, the CircuitBreaker would immediately move back to the OPEN state. In effect I was expecting it to probe if the service had recovered with a single request rather than going through another round of "failure interpretation" only to decide that service is indeed still down.

jonm commented 10 years ago

I think this is a fair point -- the documentation is written with respect to the default failure interpreter, where every failure trips the breaker. Per @joercampbell though, if there is an alternative failure interpreter in play, then if it ignores the exception it counts as a 'success' and not a 'failure' and the breaker closes. I don't see how it could behave differently without being even more confusing.

I suspect an update to the documentation to clarify this behavior is the right solution here. @mbomeara : would you have a suggestion for a doc update that would clarify this?

joercampbell commented 10 years ago

mbomeara: Was the description of the existing feature here helpful? Would you still like to see this operate differently and if so how so?

joercampbell commented 9 years ago

I may have changed my mind a little about this issue - Can I send you a patch that I just worked up as a way to see if I understood the difference between what you are saying and what it 'was' doing? Wanted to get your take on it and then maybe Jonm's take on it before I push the change into master.

mbomeara commented 9 years ago

Sure. I'm a bit rusty on this as it's been a while, but I'd be happy to have a look.

On Thursday, September 4, 2014 6:52 AM, Joe Campbell notifications@github.com wrote:

I may have changed my mind a little about this issue - Can I send you a patch that I just worked up as a way to see if I understood the difference between what you are saying and what it 'was' doing? Wanted to get your take on it and then maybe Jonm's take on it before I push the change into master. — Reply to this email directly or view it on GitHub.

joercampbell commented 9 years ago

Cool - I put the change on a branch in the repo: https://github.com/Comcast/jrugged/tree/half-closed-change check it out and let me know if this is what you were talking about.

joercampbell commented 9 years ago

Merged.

Comcast / jrugged

CircuitBreaker HALF_CLOSED state doesn't behave per documentation. #21