empiricaly / meteor-empirica-core

Core Meteor package for the experiment Empirica platform. This is where you should submit issues.
MIT License
27 stars 13 forks source link

MongoDB resilience #246

Open Karakaii opened 3 years ago

Karakaii commented 3 years ago

Hello, Using empirica:core 1.16.0

When running our app we had two instances (different days) of games that would not end when the timer of the last stage reached 0. It tells the players it is waiting on a response from the server. Refreshing doesn't fix it.

This sort of error appears in the logs on galaxy:

kgxjk
2021-04-15 16:14:16+01:00Exception in setTimeout callback: MongoError: not master and slaveOk=false
kgxjk
2021-04-15 16:14:16+01:00 at Connection.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:451:61)
kgxjk
2021-04-15 16:14:16+01:00 at Connection.emit (events.js:314:20)
kgxjk
2021-04-15 16:14:16+01:00 at Connection.EventEmitter.emit (domain.js:483:12)
kgxjk
2021-04-15 16:14:16+01:00 at processMessage (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:452:10)
kgxjk
2021-04-15 16:14:16+01:00 at TLSSocket.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:621:15)
kgxjk
2021-04-15 16:14:16+01:00 at TLSSocket.emit (events.js:314:20)
kgxjk
2021-04-15 16:14:16+01:00 at TLSSocket.EventEmitter.emit (domain.js:483:12)
kgxjk
2021-04-15 16:14:16+01:00 at addChunk (_stream_readable.js:297:12)
kgxjk
2021-04-15 16:14:16+01:00 at readableAddChunk (_stream_readable.js:272:9)
kgxjk
2021-04-15 16:14:16+01:00 at TLSSocket.Readable.push (_stream_readable.js:213:10)
kgxjk
2021-04-15 16:14:16+01:00 at TLSWrap.onStreamRead (internal/stream_base_commons.js:188:23)
kgxjk
2021-04-15 16:14:16+01:00 => awaited here:
kgxjk
2021-04-15 16:14:16+01:00 at Function.Promise.await (/app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)
kgxjk
2021-04-15 16:14:16+01:00 at packages/mongo/mongo_driver.js:1073:14

The important part seems to be: MongoError: not master and slaveOk=false

Every time the error appears, it seems to be correlated to with the cluster undergoing an election (changing primary nodes).

I had a chat with MongoDB Atlas support and these elections can happen at anytime, but our apps should be resilient to them if they have certain things put into place. More info here: https://docs.atlas.mongodb.com/resilient-application/

The two main things I took away from it (and confirmed from the support person) where:

I don't know about the drivers with Meteor, should I update the mongo package? mongo 1.10.1* Adaptor for using MongoDB and Minimongo over DDP

Maybe at one point this should be tested to see if the default package versions when creating an Empirica app should be updated.