Comcast / jrugged

A Java libary of robustness design patterns
Apache License 2.0
268 stars 94 forks source link

Don't update isAttemptLive on getStatus() calls #40

Closed christianapel closed 9 years ago

christianapel commented 9 years ago

After the reset timeout, a getStatus() call should change the circuit breaker state to DEGRADED, but not set the isAttemptLive flag to true. This would lead to a HALF_CLOSED state which doesn't allow to execute any further attempts and therefore block all subsequent successful calls.

christianapel commented 9 years ago

This issue was introduced with pull request https://github.com/Comcast/jrugged/pull/37

joercampbell commented 9 years ago

So I fixed this by removing the or state in the check, which i think was wrong. I additionally removed a unit test that I think was actually a bunk case. I am going to push the change here in a sec if you want to take a look - I think I got the same thing with a much smaller change.

christianapel commented 9 years ago

This has not fixed the actual issue. The problem just appears if the getServiceStatus() method is called after the breaker has tripped. In HALF_CLOSED state, the first line of this method calls allowRequest(), which then calls canAttempt(). This method always updates isAttemptLive to true, even if no request was performed against the wrapped service.

In this state, it's not possible to perform any further probe requests and therefore never changing back the breaker state to CLOSED.

Please copy the unit test of this pull request, it's still failing on your latest master revision.

joercampbell commented 9 years ago

Ahh - ok, I agree. I will merge this and get at least one additional change put up because the test needed some small timing tweaks to prevent it from failing in the travis build environment.