Improving MongoDB Oplog Tailing Mode Scalability with minResultFetchIntervalMs

vlasky commented 4 years ago

This is my proposal for an enhancement to redis-oplog to improve the scalability of Meteor apps that use MongoDB in Oplog tailing model.

This is identical to the Meteor feature request I posted recently, except it would be implemented in redis-oplog instead of Meteor's MongoDB code.

Redis-oplog needs a throttling feature to set a minimum time interval between successive result set fetches for a given reactive MongoDB query. This allows the developer to impose a hard limit on the maximum update rate of a given reactive MongoDB query.

This would improve scalability by greatly reducing CPU usage and memory used for reactive queries and network bandwidth when those reactive queries are published and subscribed to by clients.

In many cases, it is not necessary to fetch updates at top speed in response to every change. For example, if we are using Mongo publication to update a user interface component like a table or map or chart on a web browser page, we gain nothing from updating it more than say once per second.

A new option minResultFetchIntervalMs would be added to Mongo.Collection.find(), which represents the minimum allowable time delay in milliseconds between successive result set fetches for a given reactive query.

For example, a publication that can send reactive updates at a maximum rate of once per second would have a minResultFetchIntervalMs of 1000. A maximum rate of twice per second would be a minResultFetchIntervalMs of 500 and a maximum rate of once every 5 seconds would be a minResultFetchIntervalMs of 5000 and so on.

Equivalent functionality has existed in the mysql-live-select package, the key component of the Meteor MySQL integration since the beginning. It has been crucial in enabling our reactive Meteor MySQL apps to scale. I expect it to also do additional wonders for the scalability of Meteor apps that use redis-oplog.

Example:

Let's imagine our Meteor application displays a map with real-time vehicle locations which are stored in a MongoDB collection published by the server.

    Meteor.publish('vehicleLocations', function() {
        return Locations.find();
    });

Let's imagine that this collection receives 100 separate vehicle position updates in one second. That would result in 100 extra entries inserted into the MongoDB oplog.

In the current Meteor MongoDB code,that would result in the publication potentially being triggered by each oplog entry, sending up to 100 updates to each subscribed client, resulting in lots of network bandwidth, CPU time and memory being needlessly consumed.

How this would be improved with minInteval:

Instead, let's imagine that we could publish the collection and specify a minResultFetchIntervalMs of 1000ms (1 second):

    Meteor.publish('vehicleLocations', function() {
        return Locations.find({},{minResultFetchIntervalMs: 1000});
    });

At time=0, oplog entry 1 causes the result set to be fetched, but then no further result fetch is allowed to take place until 1000ms (1 second) has elapsed.

Between time=0 and time=1, Meteor's MongoDB observer code notices oplog entries 2-100, but will not take any immediate action. Instead, it will schedule the next result set fetch to occur at t=1.

The same scenario repeats itself for the remaining 9 seconds of activity.

At the end of the 1 second, only 2 result set fetches would have been performed instead of 100.

Answers to Expected Questions:

Q. So you are ignoring events in the oplog. How is that good?

A: They are not being ignored - we just don't react to each one of them - kind of like when someone rings your doorbell multiple times - the first ring is enough to set you in motion towards the door.

Q: How is this more efficient than just using poll and diff?

A: This approach avoids needless polling and provides predictable response times to events.
Q: What is the scalability limit with this approach?

A: How quickly the Meteor MongoDB observer code can read the oplog. For best performance, one would store their MongoDB database & oplog on an SSD (preferably NVMe).

evolross commented 4 years ago

This very reason is why we have several publications still set to polling in our app. We need real-time but there's too many incoming updates per second (like thousands in our case) that it breaks Blaze and stresses out all the underlying pub-sub and observers. So we just poll once a second.

This sounds great. It would help with this issue.

theodorDiaconu commented 4 years ago

RedisEventSmartStore Each redis event ends up in a queue and it gets merged if there is a similar event (like an update or something)

Then we consume these merged events by throttling, on every event received https://lodash.com/docs/4.17.15#throttle

Order:

Redis event hits RedisPubSubManager
Each RedisSubscriber has it's own smart-store
We call the throttled event consumption

cult-of-coders / redis-oplog

Improving MongoDB Oplog Tailing Mode Scalability with minResultFetchIntervalMs #343