eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.26k stars 2.07k forks source link

HAManager couldn't see quorum if during its initialization cluster manager has all nodes joined already #5272

Closed apanasevich closed 1 month ago

apanasevich commented 1 month ago

Version

4.5.7

Context

My app is based on Vert.x with embedded Hazelcast cluster manager. When I run several instances of the app with HA mode enabled some instances cannot deploy HA verticles, and logs show that they have not attain quorum: Quorum not attained. Deployment of verticle will be delayed until there's a quorum.

But actually they have.

The problem is that Hazelcast cluster could be in a state when all nodes have already joined at the moment when HAManager initialization starts. So HAManagers method nodeAdded will never be called. Also checkQuorum method called on init see all nodes from clusterManager, but clusterMap do not contain information from all nodes. They will put it in several milliseconds later of course, but checkQuorum method will be completed at that time and never be called again.

Steps to reproduce

It just occurs in some cases.

apanasevich commented 1 month ago

I've created the PR, but I'm not sure that the target branch is correct. It'll be great to see a fix in some of 4.5.* versions

apanasevich commented 1 month ago

Hi @vietj

thank you for reviewing ang merging the PR to the main branch.

Should I create a new PR to fix vertion 4.5.*? It'll be great to get it fixed in production.

vietj commented 1 month ago

can you provide a PR for backport ?

apanasevich commented 1 month ago

can you provide a PR for backport ?

Sure. Will do it a little bit later

apanasevich commented 3 weeks ago

Hi @vietj

I've created the PR for a backport