When hazelcast is upgraded I'd like to do some more testing of bringing down nodes under heavy load. Was this really a hazelcast bug that is fixed in 3.6? Or do I need to do some stuff in the server to account for slowing things down under heavy load especially when the cluster is not healthy? There is room for some flow control here. For example, I could stop accepting new requests or return blank responses when a connection's queue gets to a certain size. The ultimate goal being saving the cluster from going down completely.
We recently experienced an outage similar to the one mentioned here: https://groups.google.com/d/msg/hazelcast/V5F_uJCWYJA/Rgy6jICFCgAJ
When hazelcast is upgraded I'd like to do some more testing of bringing down nodes under heavy load. Was this really a hazelcast bug that is fixed in 3.6? Or do I need to do some stuff in the server to account for slowing things down under heavy load especially when the cluster is not healthy? There is room for some flow control here. For example, I could stop accepting new requests or return blank responses when a connection's queue gets to a certain size. The ultimate goal being saving the cluster from going down completely.