Open simonbelanger opened 6 years ago
It seems that I need to make use of:
Package.facts && Package.facts.Facts.incrementServerFact(
"livedata", "sessions", 1);
In order to achieve this. Can you tell me what metrics we need ? Would sessions be enough ? What sessions really represent, because we may re-use some.
I've searched all Package.facts usage inside Meteor and got the following:
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-multiplexers", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-handles", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-drivers-oplog", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-drivers-polling", -1);
Package.facts && Package.facts.Facts.incrementServerFact("livedata", "sessions", 1);
Package.facts && Package.facts.Facts.incrementServerFact("livedata", "subscriptions", 1);
Oplog notifications is the key metric IMO because this is the one necessary to determine where to do namespaces/channels optimization.
I understand their importance. I'm going to take care of this.
By investigating this: https://github.com/meteorhacks/kadira/tree/master/lib/hijack
I see that the metrics are wrapped and in theory it should work. How do you use Meteor APM ? Is it a package? If yes, is it added after redis-oplog initializes ?
Also can you tell me what metrics you need ? For the numbers, Package.facts can work, but there's more to it.
We're using mdg:meteor-apm-agent
package. It is placed before cultofcoders:redis-oplog
in the packages
file.
Oplog notifications is the only metric we really need regarding Live Queries.
@simonbelanger by oplog notifications, do you mean the redis events (in this case) ? I would love if you could give me an example by what you mean measuring oplog notifications, size, count?
That would be the number of redis event processed
Ok I need to dig into how they measure this, what about timing ? Like event processing cost ?
The key is to process as less events as possible by namespacing collection's mutations/observers, so we need to know how much events are processed on a publication basis to identify where optimization will have the most significant effect.
Got it.
https://github.com/meteor/meteor-apm-agent/blob/master/lib/hijack/instrument.js What if you try to add redis-oplog before it. We recommend adding redis-oplog as the top package.
For self: https://github.com/meteor/meteor-apm-agent/blob/master/lib/hijack/wrap_observers.js#L161 It may be good to override _observeChanges like this so it will work with Kadira
Added issue here: https://github.com/meteor/meteor/issues/9342
@simonbelanger in the meantime before we get to the level where we can integrate the APM, you should think about optimizing in the following way:
You most likely have a organizational structure where Classroom is like a "chief" collection, and you relate users, games, other objects to it. This means that classroom can become a namespace.
return Collection.find({}, {
namespace: `classroom::{classroomId}`
});
To make this nice, you can think about a wrapper
return Collection.find({}, {
namespace: getClassroomNamespace(classroomId)
});
Collection.insert(doc, {
namespace: getClassroomNamespace(classroomId)
});
Collection.update(selector, modifier, {
namespace: getClassroomNamespace(classroomId)
});
Related to this issue, wouldn't it be possible for the APM to accept some custom data without "emulating" oplog functionality. Like a way to hook any kind of package inside it and push custom data that can be aggregated.
About your classroom namespace proposition, this is basically what I've done. So far I did it for a single collection, the one with the highest operation threshold and existing observers. There's 2 or 3 other collections on which I expect to do that.
After that, I'll also try to set overridePublishFunction
to false
and use publishWithRedis
specifically on concerned publications to sees if there's some gain.
@simonbelanger I suggest you don't do it. Because you need to add disable-oplog. If oplog is enabled, your instances will be again, bombarded. When you saw the performance improvements, was it already namespaced ?
I suggest you don't do it. Because you need to add disable-oplog. If oplog is enabled, your instances will be again, bombarded.
So what your saying is that I absolutely need to add disable-oplog
to take advantage of the package, and if I add it I obviously need to use redis for every publication because mongodb oplog no longer work? Then what's the use case for overridePublishFunction
option and publishWithRedis
function?
When you saw the performance improvements, was it already namespaced ?
yes
If you didn't have "disable-oplog" but you had an OPLOG_URL you have been hit by oplog notifications, even if they weren't processed it is a cost of getting data, parsing it, etc. So did you still have MONGO_OPLOG_URL env variable set ? If yes, you should see serious improvements.
The reason for giving the option to not override it was that I wanted to leave the door opened to keep Oplog and RedisOplog working together.
If you have polling cursors, if you return them normally, it will use the original polling driver as expected.
if (options.oplogUrl && !Package['disable-oplog']) {
self._oplogHandle = new OplogHandle(options.oplogUrl, self.db.databaseName);
self._docFetcher = new DocFetcher(self);
}
And inside OplogHandle
constructor:
self._startTailing();
In conclusion if you had that situation, then improvements should be massive.
So did you still have MONGO_OPLOG_URL env variable set ? If yes, you should see serious improvements.
I don't see why still having MONGO_OPLOG_URL env variable set would help me see serious improvements? I'd guest that it would no longer be used anyway since I added the disable_oplog
package... Are you saying that I'm still tailing mongodb oplog even with disable_oplog
?
Sorry, I just want to make sure I understand correctly.
Oh sorry, if you had disable-oplog
then you are fine, you were no longer tailing the oplog.
kk :)
I'm a bit disappointed with the only 50% - 100% improvement to be honest. But I'm guessing we're also limited by the amount of connections an instance can handle.
What is the number of connections/instance before and after?
I'm a bit disappointed with the only 50% - 100% improvement to be honest. But I'm guessing we're also limited by the amount of connections an instance can handle.
Please don't be disappointed. I'm not, it's still a significant improvement.
What is the number of connections/instance before and after?
It's very variable based on the time of day
Very interesting discussion but off-topic from the original issue ;). We should move to another channel.
@simonbelanger yes but an instance can handle properly a certain amount of concurrent users, even if you do just method calls. Would be interesting to know those numbers (Sessions/Host). I know there's a limit. I know that just seeing connections/instance isn't a good metric for redis-oplog, because it depends a lot on your other methods and other stuff you do, how many LFS queries you have (with filters,limit,sort -- these are the heaviest), how many queries by "_id" and so on.
If we reduced your infrastructure by a half or a third. Then it doesn't mean redis-oplog is 2 times faster than oplog tailing, it may be that it's 5 times! My plan is to get to a number that speaks a lot. 1.5 times... meh.
do you have something on Slack for this package development?
Hi @theodorDiaconu, I hope you're doing well! We just ran in a major performance issue that took us a very long time to diagnose (not related to redis-oplog
). We were over-publishing thousands of documents in a specific publication but it's been very hard to pinpoint the source of the problem because of the metrics that are no longer tracked in the APM:
I really miss the "Fetch Documents", "Observer Changes" and "Oplog Notifications" metrics and I think it would be possible to hook the tracking of these specific info somewhere in the redis-oplog
workflow.
I digged down a little in meteor-apm-agent
package and it seems that it is actually exporting a global Kadira
object that would potentially contain everything needed to add the missing tracking in redis-oplog
. It looks like all the magic is happening in https://github.com/meteor/meteor-apm-agent/tree/master/lib/hijack so I guess I can figure out how the metrics are tracked by meteor-apm-agent
but I'd really like to hear your thoughts on how and where to hook it with redis-oplog
.
Thank you!
did u try this branch? https://github.com/cult-of-coders/redis-oplog/pull/283 . It hooks into the core, and you benefit of all metrics.
wow that was fast ⚡️😄 I'll try to test drive this today.
It looks like a major refactoring that has been pending for some time, is it production ready?
so, good news...
...and bad news...
@simonbelanger did you ever have luck with getting this working? If not, have you had a try at hacking it or any progress we could see/work off of?
@etyp well not really... we never found the time to dig deeper, the task is still in our backlog... I still feel real love for these long lost metrics though
It looks like Live Queries metrics are no longer tracked in Meteor APM when using this package. Known issue?