cloudfoundry-community / memcache-release

Bosh releases for the Memcache service.
Apache License 2.0
6 stars 1 forks source link

More fully test node loss under heavy load #7

Closed youngm closed 8 years ago

youngm commented 8 years ago

We recently experienced an outage similar to the one mentioned here: https://groups.google.com/d/msg/hazelcast/V5F_uJCWYJA/Rgy6jICFCgAJ

When hazelcast is upgraded I'd like to do some more testing of bringing down nodes under heavy load. Was this really a hazelcast bug that is fixed in 3.6? Or do I need to do some stuff in the server to account for slowing things down under heavy load especially when the cluster is not healthy? There is room for some flow control here. For example, I could stop accepting new requests or return blank responses when a connection's queue gets to a certain size. The ultimate goal being saving the cluster from going down completely.

youngm commented 8 years ago

I've done a lot of testing in this regard. I think hazelcast 3.7 must fix a bunch of these issues cause I couldn't get the cluster to die.