Closed marto closed 5 years ago
Yeah - I think someone else was having a very similar problem with the codebase. What version are you using in your environment?
Also - can you pull the recent master and see if the recent changes made around this solve the problem you are describing? We made some minor changes as a result of someone elses commenting about a similar problem where the isAttemptLive was being manipulated incorrectly when it didn't need to be causing inconsistent behavior.
Joe
On Wed, Mar 30, 2016 at 10:51 AM Martin Petrovsky notifications@github.com wrote:
We have been using the library in production and found a few issues with the CircuitBreaker class implementation not being thread safe.
Looking at the source, I believe there are a number of concurrency issues with the class around the manipulation of state, isAttemptLive, isHardTrip, byPass fields. From my experience it looks like it's not thread safe, although hard to describe why. Here is an attempt to describe what we observed in a production scenario:
On occasion when the caller is highly concurrent (i.e. runs on many threads) the CircuitBreaker get's stuck in a HALF_CLOSED state. It's stuck in a HALF_CLOSED state because the isAttemptLive field is not atomic and is flipped in both the close, trip methods (when called from a successful or a unsuccessful invoke method) while at the same time (or out of order) the synchronized canAttempt also manipulates the isAttemptLive state.
It's possible for the close or trip methods to be called out of order to the synchronized canAttempt method. This can cause a situation where sate=HALF_CLOSED, isAttemptLive=true but the actual request has completed. Thus the isAttemptLive=true will never be flipped back as there is not attempt being made. The result is that the CB remains forever in a HALF_CLOSED state. I have verified existence of this situation of CircuitBreaker being in this state once in a debugging situation.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/Comcast/jrugged/issues/45
It seems that the changes in class CircuitBreaker
in revision #37 solve the problem. Do you have a plan to release a version? Because having this issue, the library cannot be used in production; every time isAttemptLive
is incorrectly set to true, the software enters an incorrect final state and needs a restart, which is not acceptable in production environment.
Releasing a new version here shortly.
We have been using the library in production and found a few issues with the
CircuitBreaker
class implementation not being thread safe.Looking at the source, I believe there are a number of concurrency issues with the class around the manipulation of
state
,isAttemptLive
,isHardTrip
,byPass
fields. From my experience it looks like it's not thread safe, although hard to describe why. Here is an attempt to describe what we observed in a production scenario:On occasion when the caller is highly concurrent (i.e. runs on many threads) the
CircuitBreaker
get's stuck in aHALF_CLOSED
state. It's stuck in aHALF_CLOSED
state because theisAttemptLive
field is not atomic and is flipped in both theclose
,trip
methods (when called from a successful or a unsuccessfulinvoke
method) while at the same time (or out of order) thesynchronized
canAttempt
also manipulates theisAttemptLive
state.It's possible for the
close
ortrip
methods to be called out of order to thesynchronized canAttempt
method. This can cause a situation wheresate=HALF_CLOSED
,isAttemptLive=true
but the actual request has completed. Thus theisAttemptLive=true
will never be flipped back as there is not attempt being made. The result is that the CB remains forever in aHALF_CLOSED
state. I have verified existence of this situation ofCircuitBreaker
being in this state once in a debugging situation (jrugged-core version = 3.2.2).