Circuit erroneously closed when using a shared storage

jcwilson commented 6 years ago

Due to these lines: https://github.com/danielfm/pybreaker/blob/c84055a61ea846baae786e85bc3ed191a55ba326/src/pybreaker.py#L62 https://github.com/danielfm/pybreaker/blob/c84055a61ea846baae786e85bc3ed191a55ba326/src/pybreaker.py#L728

Also, this will cause the fail counter to be reset anytime a new CircuitBreaker is instantiated using the same shared storage (like the provided CircuitRedisStorage).

This is major problem for usage within web app servers, where you may have many processes coming in and out of existence at any time due to scaling behavior or processes sporadically dying and being resurrected.

It seems like a possible solution might involve rethinking what the notify argument is being used for. It appears to be set to True in all cases except for this initial default value being set on line 62. Perhaps move the state change effect into the if notify: block as well?

A better solution might be to not assume that a CircuitBreaker should start in the Closed state, but instead ask the self._state_storage object for the current state and initialize the CircuitBreaker without affecting the state storage.

danielfm commented 6 years ago

Thanks for reporting this.

This is major problem for usage within web app servers, where you may have many processes coming in and out of existence at any time due to scaling behavior or processes sporadically dying and being resurrected.

As far as I understand - please @phillbaker correct me if I'm wrong - the Redis storage is intended to be used whenever you want to share state between breaker instances, making those act like a single unit due to the shared state between them. So, unless you create different CircuitRedisStorage (and providing different values for the namespace argument), this is expected behavior.

Maybe for the mentioned use case (circuit breakers within web servers, each breaker with its own state), you should use the default in-memory storage.

phillbaker commented 6 years ago

Hm, I think some more information would help. If the following is happening, then this does seem like a bug:

this will cause the fail counter to be reset anytime a new CircuitBreaker is instantiated using the same shared storage (like the provided CircuitRedisStorage).

I would have thought these two lines prevented overriding the shared counter and state in redis:

https://github.com/danielfm/pybreaker/blob/c84055a61ea846baae786e85bc3ed191a55ba326/src/pybreaker.py#L450-L451

@jcwilson - what impacts of this are you seeing in your environment?

phillbaker commented 6 years ago

Sorry for the double post, but I just saw https://github.com/danielfm/pybreaker/pull/25. Would that also help address this issue?

alukach commented 6 years ago

@jcwilson Thanks for posting this, I am just getting started with pybreaker and ran into the same issue that you described.

@danielfm

the Redis storage is intended to be used whenever you want to share state between breaker instances, making those act like a single unit due to the shared state between them. So, unless you create different CircuitRedisStorage (and providing different values for the namespace argument), this is expected behavior.

Are you suggesting that the way that pybreaker is currently written is correct? My expectation would be that the library wouldn't reset the counter when you spin up a new instance of the breaker.

Maybe for the mentioned use case (circuit breakers within web servers, each breaker with its own state), you should use the default in-memory storage.

I think this is the opposite of the described use case. @jcwilson seems to be suggesting that he wants to share state between web servers but this is not really possible as each circuit breaker resets the global state of the breaker.

@phillbaker

I would have thought these two lines prevented overriding the shared counter and state in redis:

Those lines are indeed written to respect pre-existing state, however it's trampled over when initiating a circuit breaker due to the line posted in the issue's body:

https://github.com/danielfm/pybreaker/blob/c84055a61ea846baae786e85bc3ed191a55ba326/src/pybreaker.py#L41-L62

https://github.com/danielfm/pybreaker/blob/c84055a61ea846baae786e85bc3ed191a55ba326/src/pybreaker.py#L713-L728

alukach commented 6 years ago

Please see #27 for my suggestion on how to resolve this issue. Comments/critique welcome!

jcwilson commented 6 years ago

what impacts of this are you seeing in your environment?

Thanks a lot for taking a look. I neglected to mention that we started on version ~0.3.1 and have written our own redis backend that stores things in a slightly different manner, so it doesn't make use of the setnx() code in CircuitRedisStorage. That being said, I believe the current implementation as shipped with this library will still produce the same undesirable behavior. I believe the issue is best resolved at the CircuitBreaker level, rather than at the CircuitBreakerStorage level.

I think @alukach has expressed the problem pretty well. To keep things simple, let's speak in terms of a single circuit breaker instantiation in the code, but the code might be run in several processes (ie. uwsgi worker processes). We're using redis storage to provide a central shared counter for each circuit breaker's context so that an error on any box/process will increment the counter for all the rest, too. All of our processes will use the same namespace. I think if we were going to assign each process its own namespace, we'd just stick with the in-memory storage option and forego the overhead of the redis storage.

danielfm commented 6 years ago

Closed by #27, feel free to re-open this if you have problems.

danielfm / pybreaker

Circuit erroneously closed when using a shared storage #26