eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.32k stars 2.08k forks source link

Vertx eventbus not working even after cluster formation #5194

Closed harisht87 closed 5 months ago

harisht87 commented 6 months ago

Questions

We have two services running in OpenShift/K8s platform developed using Vertx framework, the cluster.xml is configure to use Kubernetes API and the cluster is getting formed correctly(logs below). Vertx eventbus is set as in below code

Service 1 : vertx.eventBus().request("child",s,res-> { /* response handler goes here */ });
Service 2 : vertx.eventBus().consumer("child", (message)  -> { /** message handler goes here **/ });

Version

Hazelcast version : 5.3.6 vert.x version : 4.5.7 vert.x-hazecast : 4.5.7 K8s version : v1.25.11+c43ddea Red Hat Enterprise Linux 8.9 (Ootpa) Istio version : 1.17.5-solo java - jdk 8

Context

Sending message from one service to other service is not working getting below error. Not connected to server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 - starting queuing Draining the queue for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 Cluster connection closed for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9

Do you have a reproducer?

No

Steps to reproduce

NA

Extra

Service 1 : [After 2nd member in cluster is deployed] ——————— Service 1 logs ———————— INFO: [192.168.54.48]:5701 [demo] [5.3.6] Initialized new cluster connection between /192.168.54.48:5701 and /127.0.0.6:52235 202Apr 23, 2024 1:09:26 AM com.hazelcast.internal.cluster.ClusterService 203INFO: [192.168.54.48]:5701 [demo] [5.3.6] 204 205Members {size:2, ver:2} [ 206Member [192.168.54.48]:5701 - 3ee30ad6-4788-4aec-8050-7ed5603abdfa this 207Member [192.168.58.91]:5701 - 096a8f32-9212-4323-8fcf-5f505c11587f 208] Service 2 : ——————— INFO: [192.168.58.91]:5701 [demo] [5.3.6] Kubernetes plugin discovered node name: cld-paas-d-eusw1b-3-k659z-worker-ds04-ckrsv 58Apr 23, 2024 1:09:20 AM com.hazelcast.kubernetes.KubernetesClient 59WARNING: Cannot fetch public IPs of Hazelcast Member PODs, you won't be able to use Hazelcast Smart Client from outside of the Kubernetes network 60Apr 23, 2024 1:09:21 AM com.hazelcast.internal.server.tcp.TcpServerConnection 61INFO: [192.168.58.91]:5701 [demo] [5.3.6] Initialized new cluster connection between /192.168.58.91:38785 and /192.168.54.48:5701 62Apr 23, 2024 1:09:26 AM com.hazelcast.internal.cluster.ClusterService 63INFO: [192.168.58.91]:5701 [demo] [5.3.6] 64 65Members {size:2, ver:2} [ 66Member [192.168.54.48]:5701 - 3ee30ad6-4788-4aec-8050-7ed5603abdfa 67Member [192.168.58.91]:5701 - 096a8f32-9212-4323-8fcf-5f505c11587f this 68]

Issue 1: ———————————— 01:13:42.674 [vert.x-eventloop-thread-2] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Not connected to server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 - starting queuing 01:13:42.774 [vert.x-eventloop-thread-2] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Draining the queue for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 01:13:42.780 [vert.x-eventloop-thread-2] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Cluster connection closed for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9

(NO_HANDLERS,-1) No handlers for address child

Issue 2 : ————— When service 2 is deployed and service 1 is trying to initialize a cluster connection, Instead of the connecting to the cluster ip, it is trying to connect an unknown ip but cluster is formed with the correct service 2 ip.

Service 1 ip : 192.168.54.48 Service 2 ip : 192.168.58.91

Unknown ip : 127.0.0.6:52235

Log snippet ——————— Service 1 logs ———————— INFO: [192.168.54.48]:5701 [demo] [5.3.6] Initialized new cluster connection between /192.168.54.48:5701 and /127.0.0.6:52235 202Apr 23, 2024 1:09:26 AM com.hazelcast.internal.cluster.ClusterService 203INFO: [192.168.54.48]:5701 [demo] [5.3.6] 204 205Members {size:2, ver:2} [ 206Member [192.168.54.48]:5701 - 3ee30ad6-4788-4aec-8050-7ed5603abdfa this 207Member [192.168.58.91]:5701 - 096a8f32-9212-4323-8fcf-5f505c11587f 208]

Application hazelcast cluster.xml file ————————————————

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-4.0.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <properties>
        <property name="hazelcast.discovery.enabled">true</property>
        <property name="hazelcast.rest.enabled">true</property>
        <property name="hazelcast.partial.member.disconnection.resolution.heartbeat.count">5</property>
        <property name="hazelcast.partial.member.disconnection.resolution.algorithm.timeout.seconds">10</property>
    </properties>
    <cluster-name>demo</cluster-name>
    <!--<split-brain-protection enabled="true" name="probabilistic-split-brain-protection">
        <minimum-cluster-size>3</minimum-cluster-size>
        <protect-on>READ_WRITE</protect-on>
        <probabilistic-split-brain-protection acceptable-heartbeat-pause-millis="5000"
                                              max-sample-size="500" suspicion-threshold="10" />
    </split-brain-protection>
    <set name="split-brain-protected-set">
        <split-brain-protection-ref>probabilistic-split-brain-protection</split-brain-protection-ref>
    </set>-->
    <network>
        <join>
            <multicast enabled="false"/>
            <kubernetes enabled="true" />
        </join>
        <interfaces enabled="true">
            <interface>192.168.*.*</interface>
        </interfaces>
    </network>
</hazelcast>
HazelcastClusterManager mgr = new HazelcastClusterManager();
host = InetAddress.getLocalHost().getHostAddress();
options.setClusterManager(mgr);
options.getEventBusOptions().setHost(host);
Vertx.clusteredVertx(options, vertx -> {
                    vertx.result().deployVerticle(verticleName, deploymentOptions);
                    vertx.result().exceptionHandler(event -> logger.error("Vertx exception ", event));
                 });
tsegismont commented 5 months ago

Hi @harisht87

This is the GH repository of the Vert.x core library, please send future reports to vertx-hazelcast

In the Java code, it seems the event bus host address is forced to InetAddress.getLocalHost().getHostAddress()

Likely that's why the nodes can't establish a connection to send messages. You should let the cluster manager pickup the right address.