RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.08k stars 10.34k forks source link

Clustering not working properly. All instances work, but one has status "waiting" #20682

Closed PhilThurston closed 3 years ago

PhilThurston commented 3 years ago

Description:

Currently on fresh installs of the latest docker version of rocketchat drop at least one peer when replicated across more than one instance. In the case of just 2 replicas, both servers both will have a connecting status in the info > instances section. After some time one instance will show as connected. The other instances will show as waiting then eventually drop off the instances list.

This wouldn't boil down to any configuration issues and everything is set correctly including explicitly stating INSTANCE_IP as an ENV variable. All instances can talk to each other which has been confirmed.

We have confirmed this on multiple environments and with even default settings except replica counts. There are no errors in any of the replica's logs.

Steps to reproduce:

The fastest way to reproduce this is to do it via the official helm chart. helm upgrade --install test rocketchat/rocketchat -n test --set mongodb.mongodbUsername=rocketchat,mongodb.mongodbPassword=test123,mongodb.mongodbDatabase=rocketchat,mongodb.mongodbRootPassword=root-test123,replicaCount=2

You can then attach to the deployment with this: kubectl port-forward --namespace test $(kubectl get pods --namespace test -l "app.kubernetes.io/name=rocketchat,app.kubernetes.io/instance=test" -o jsonpath='{ .items[0].metadata.name }') 8888:3000

From there login and navigate to http://localhost:8888/admin/info

By the time it is setup you'll see a single instance reported even though both pods are running without any errors. Run kubectl rollout restart deployment/test-rocketchat -n test to restart all the pods and then check again. You'll be able to see the events happening in the description above.

You can change the replica number to something higher like 3 or and you'll then only see 2 are able to connect and talk to each other.

Expected behavior:

Clustering should be working at a minimum with the default settings.

Actual behavior:

A single instance is always dropped

Server Setup Information:

Client Setup Information

close-issue-app[bot] commented 3 years ago

This issue was closed because it does not use our bug report issue template.

Please make sure to use it and fill it as much as you can so we can provide better and faster support.

The following sections must not be removed, or else the BOT will close it immediately again: