RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
39.95k stars 10.3k forks source link

RocketChat connect mongodb time out #13178

Open liushuang00o opened 5 years ago

liushuang00o commented 5 years ago

Hi,

Description:

We are running a multi-instances of Rocket.Chat. Use Mongodb Replica Set

Extract of one Docker Compose file :

rocketchat:
    image: rocketchat/rocket.chat:latest
    environment:
        - PORT=3000
        - ROOT_URL=http://localhost
        - MONGO_URL=mongodb://rocket:password@rocket-1:27017,rocket-2:27017,rocket-3:27017/rocketchat?authSource=admin&replicaSet=testrs&readPreference=nearest&w=majority
        - MONGO_OPLOG_URL=mongodb://oploguser:password@rocket-1:27017,rocket-2:27017,rocket-3:27017/local?authSource=admin&replicaSet=testrs
        - INSTANCE_IP=192.168.3.25
    ports:
        - 3000:3000
    extra_hosts:
        - "rocket-1:192.168.3.70"
        - "rocket-2:192.168.3.71"
        - "rocket-3:192.168.3.72"

But Mongodb Replica Set doesn’t work properly.

Steps to reproduce:

1)shutdown the master mongodb 192.168.3.70

2)Check the rs.status()

Now we can see 192.168.3.71 is PRIMARY,it's right.

{
    "set" : "testrs",
    "date" : ISODate("2019-01-17T12:06:24.041Z"),
    "myState" : 1,
    "term" : NumberLong(10),
    "syncingTo" : "",
    "syncSourceHost" : "",
    "syncSourceId" : -1,
    "heartbeatIntervalMillis" : NumberLong(2000),
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1547726619, 1),
            "t" : NumberLong(9)
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1547726619, 1),
            "t" : NumberLong(9)
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1547726779, 1),
            "t" : NumberLong(10)
        },
        "durableOpTime" : {
            "ts" : Timestamp(1547726779, 1),
            "t" : NumberLong(10)
        }
    },
    "lastStableCheckpointTimestamp" : Timestamp(1547726619, 1),
    "members" : [
        {
            "_id" : 0,
            "name" : "192.168.3.70:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2019-01-17T12:06:21.081Z"),
            "lastHeartbeatRecv" : ISODate("2019-01-17T12:03:45.170Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "Error connecting to 192.168.3.70:27017 :: caused by :: No route to host",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "configVersion" : -1
        },
        {
            "_id" : 1,
            "name" : "192.168.3.71:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 19015,
            "optime" : {
                "ts" : Timestamp(1547726779, 1),
                "t" : NumberLong(10)
            },
            "optimeDate" : ISODate("2019-01-17T12:06:19Z"),
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "electionTime" : Timestamp(1547726647, 1),
            "electionDate" : ISODate("2019-01-17T12:04:07Z"),
            "configVersion" : 226639,
            "self" : true,
            "lastHeartbeatMessage" : ""
        },
        {
            "_id" : 2,
            "name" : "192.168.3.72:27017",
            "health" : 1,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 19009,
            "lastHeartbeat" : ISODate("2019-01-17T12:06:23.240Z"),
            "lastHeartbeatRecv" : ISODate("2019-01-17T12:06:22.713Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "configVersion" : 226639
        }
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1547726779, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1547726779, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}

3)View the log display as follows.

192.168.3.70:27017 closed, but 192.168.3.71:27017 timed out.

rocketchat_1  | Exception in setInterval callback: { MongoNetworkError: connection 3 to 192.168.3.70:27017 closed
rocketchat_1  |     at Socket. (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:276:9)
rocketchat_1  |     at Object.onceWrapper (events.js:315:30)
rocketchat_1  |     at emitOne (events.js:116:13)
rocketchat_1  |     at Socket.emit (events.js:211:7)
rocketchat_1  |     at TCP._handle.close [as _onclose] (net.js:557:12)
rocketchat_1  |   name: 'MongoNetworkError',
rocketchat_1  |   errorLabels: [ 'TransientTransactionError' ],
rocketchat_1  |   [Symbol(mongoErrorContextSymbol)]: {} }
rocketchat_1  | Exception in setInterval callback: { MongoNetworkError: connection 30 to 192.168.3.71:27017 timed out
rocketchat_1  |     at Socket. (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:259:7)
rocketchat_1  |     at Object.onceWrapper (events.js:313:30)
rocketchat_1  |     at emitNone (events.js:106:13)
rocketchat_1  |     at Socket.emit (events.js:208:7)
rocketchat_1  |     at Socket._onTimeout (net.js:410:8)
rocketchat_1  |     at ontimeout (timers.js:498:11)
rocketchat_1  |     at tryOnTimeout (timers.js:323:5)
rocketchat_1  |     at Timer.listOnTimeout (timers.js:290:5)

4)Telnet 192.168.3.71 27017

 
MacBook-Pro-2:~ liushuang$ telnet 192.168.3.71 27017
Trying 192.168.3.71...
Connected to bogon.
Escape character is '^]'.

Server Setup Information:

Version of Rocket.Chat Version: 0.73.2 Operating System: Centos 7.5 Deployment Method: Docker Number of Running Instances: 2 DB Replicaset Oplog: enabled NodeJS Version: 8.11.3 - x64 MongoDB Version: 4.0.5

Thanks,

Dou

mddvul22 commented 5 years ago

We see this in 0.74.3, also. If a mongo replica set member goes down, Rocket Chat quits working. Seems like this defeats the purpose of a redundant mongo replica set.

broderix commented 5 years ago

Confirmed on version 1.0.3 The same problem.

StarScream902 commented 5 years ago

in 1.1.1 same problem

Alex-spotcap commented 5 years ago

1.2.1 still there

vinogradovia commented 4 years ago

2.4.2 still there

ndroo commented 4 years ago

3 and beyond still an issue. This is a huge issue...can we get some attention to it?

maiconbaum commented 2 years ago

Still an issue on 4.5.0.