Live Queries metrics in Meteor APM

simonbelanger commented 6 years ago

It looks like Live Queries metrics are no longer tracked in Meteor APM when using this package. Known issue?

theodorDiaconu commented 6 years ago

It seems that I need to make use of:

Package.facts && Package.facts.Facts.incrementServerFact(
    "livedata", "sessions", 1);

In order to achieve this. Can you tell me what metrics we need ? Would sessions be enough ? What sessions really represent, because we may re-use some.

theodorDiaconu commented 6 years ago

I've searched all Package.facts usage inside Meteor and got the following:

Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-multiplexers", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-handles", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-drivers-oplog", 1);
Package.facts && Package.facts.Facts.incrementServerFact("mongo-livedata", "observe-drivers-polling", -1);
Package.facts && Package.facts.Facts.incrementServerFact("livedata", "sessions", 1);
Package.facts && Package.facts.Facts.incrementServerFact("livedata", "subscriptions", 1);

simonbelanger commented 6 years ago

Oplog notifications is the key metric IMO because this is the one necessary to determine where to do namespaces/channels optimization.

theodorDiaconu commented 6 years ago

I understand their importance. I'm going to take care of this.

theodorDiaconu commented 6 years ago

By investigating this: https://github.com/meteorhacks/kadira/tree/master/lib/hijack

I see that the metrics are wrapped and in theory it should work. How do you use Meteor APM ? Is it a package? If yes, is it added after redis-oplog initializes ?

Also can you tell me what metrics you need ? For the numbers, Package.facts can work, but there's more to it.

simonbelanger commented 6 years ago

We're using mdg:meteor-apm-agent package. It is placed before cultofcoders:redis-oplog in the packages file.

Oplog notifications is the only metric we really need regarding Live Queries.

theodorDiaconu commented 6 years ago

@simonbelanger by oplog notifications, do you mean the redis events (in this case) ? I would love if you could give me an example by what you mean measuring oplog notifications, size, count?

simonbelanger commented 6 years ago

That would be the number of redis event processed

theodorDiaconu commented 6 years ago

Ok I need to dig into how they measure this, what about timing ? Like event processing cost ?

simonbelanger commented 6 years ago

The key is to process as less events as possible by namespacing collection's mutations/observers, so we need to know how much events are processed on a publication basis to identify where optimization will have the most significant effect.

theodorDiaconu commented 6 years ago

Got it.

theodorDiaconu commented 6 years ago

https://github.com/meteor/meteor-apm-agent/blob/master/lib/hijack/instrument.js What if you try to add redis-oplog before it. We recommend adding redis-oplog as the top package.

For self: https://github.com/meteor/meteor-apm-agent/blob/master/lib/hijack/wrap_observers.js#L161 It may be good to override _observeChanges like this so it will work with Kadira

theodorDiaconu commented 6 years ago

Added issue here: https://github.com/meteor/meteor/issues/9342

theodorDiaconu commented 6 years ago

@simonbelanger in the meantime before we get to the level where we can integrate the APM, you should think about optimizing in the following way:

You most likely have a organizational structure where Classroom is like a "chief" collection, and you relate users, games, other objects to it. This means that classroom can become a namespace.

return Collection.find({}, {
    namespace: `classroom::{classroomId}`
});

To make this nice, you can think about a wrapper

return Collection.find({}, {
    namespace: getClassroomNamespace(classroomId)
});

Collection.insert(doc, {
    namespace: getClassroomNamespace(classroomId)
});
Collection.update(selector, modifier, {
    namespace: getClassroomNamespace(classroomId)
});

theodorDiaconu commented 6 years ago

Related to this issue, wouldn't it be possible for the APM to accept some custom data without "emulating" oplog functionality. Like a way to hook any kind of package inside it and push custom data that can be aggregated.

simonbelanger commented 6 years ago

About your classroom namespace proposition, this is basically what I've done. So far I did it for a single collection, the one with the highest operation threshold and existing observers. There's 2 or 3 other collections on which I expect to do that.

After that, I'll also try to set overridePublishFunction to false and use publishWithRedis specifically on concerned publications to sees if there's some gain.

theodorDiaconu commented 6 years ago

@simonbelanger I suggest you don't do it. Because you need to add disable-oplog. If oplog is enabled, your instances will be again, bombarded. When you saw the performance improvements, was it already namespaced ?

simonbelanger commented 6 years ago

I suggest you don't do it. Because you need to add disable-oplog. If oplog is enabled, your instances will be again, bombarded.

So what your saying is that I absolutely need to add disable-oplog to take advantage of the package, and if I add it I obviously need to use redis for every publication because mongodb oplog no longer work? Then what's the use case for overridePublishFunction option and publishWithRedis function?

When you saw the performance improvements, was it already namespaced ?

yes

theodorDiaconu commented 6 years ago

If you didn't have "disable-oplog" but you had an OPLOG_URL you have been hit by oplog notifications, even if they weren't processed it is a cost of getting data, parsing it, etc. So did you still have MONGO_OPLOG_URL env variable set ? If yes, you should see serious improvements.

The reason for giving the option to not override it was that I wanted to leave the door opened to keep Oplog and RedisOplog working together.

If you have polling cursors, if you return them normally, it will use the original polling driver as expected.

theodorDiaconu commented 6 years ago

  if (options.oplogUrl && !Package['disable-oplog']) {
    self._oplogHandle = new OplogHandle(options.oplogUrl, self.db.databaseName);
    self._docFetcher = new DocFetcher(self);
  }

And inside OplogHandle constructor:

  self._startTailing();

In conclusion if you had that situation, then improvements should be massive.

simonbelanger commented 6 years ago

So did you still have MONGO_OPLOG_URL env variable set ? If yes, you should see serious improvements.

I don't see why still having MONGO_OPLOG_URL env variable set would help me see serious improvements? I'd guest that it would no longer be used anyway since I added the disable_oplog package... Are you saying that I'm still tailing mongodb oplog even with disable_oplog?

Sorry, I just want to make sure I understand correctly.

theodorDiaconu commented 6 years ago

Oh sorry, if you had disable-oplog then you are fine, you were no longer tailing the oplog.

simonbelanger commented 6 years ago

kk :)

theodorDiaconu commented 6 years ago

I'm a bit disappointed with the only 50% - 100% improvement to be honest. But I'm guessing we're also limited by the amount of connections an instance can handle.

theodorDiaconu commented 6 years ago

What is the number of connections/instance before and after?

simonbelanger commented 6 years ago

I'm a bit disappointed with the only 50% - 100% improvement to be honest. But I'm guessing we're also limited by the amount of connections an instance can handle.

Please don't be disappointed. I'm not, it's still a significant improvement.

What is the number of connections/instance before and after?

It's very variable based on the time of day

simonbelanger commented 6 years ago

Very interesting discussion but off-topic from the original issue ;). We should move to another channel.

theodorDiaconu commented 6 years ago

@simonbelanger yes but an instance can handle properly a certain amount of concurrent users, even if you do just method calls. Would be interesting to know those numbers (Sessions/Host). I know there's a limit. I know that just seeing connections/instance isn't a good metric for redis-oplog, because it depends a lot on your other methods and other stuff you do, how many LFS queries you have (with filters,limit,sort -- these are the heaviest), how many queries by "_id" and so on.

If we reduced your infrastructure by a half or a third. Then it doesn't mean redis-oplog is 2 times faster than oplog tailing, it may be that it's 5 times! My plan is to get to a number that speaks a lot. 1.5 times... meh.

simonbelanger commented 6 years ago

do you have something on Slack for this package development?

theodorDiaconu commented 6 years ago

https://join.slack.com/t/redis-oplog/shared_invite/enQtMjcxMjM4MjQ0NjQ2LWRlMmFiOTc5OTM4ZmZmNWQyYmJjODE4ZDBhODMyN2RiZDc5ZGRmMzExNTNiMjQzZTEzYzE1M2Y2ZjYxNWExNjE

simonbelanger commented 5 years ago

Hi @theodorDiaconu, I hope you're doing well! We just ran in a major performance issue that took us a very long time to diagnose (not related to redis-oplog). We were over-publishing thousands of documents in a specific publication but it's been very hard to pinpoint the source of the problem because of the metrics that are no longer tracked in the APM:

2019-02-11 at 9 33 am

I really miss the "Fetch Documents", "Observer Changes" and "Oplog Notifications" metrics and I think it would be possible to hook the tracking of these specific info somewhere in the redis-oplog workflow.

I digged down a little in meteor-apm-agent package and it seems that it is actually exporting a global Kadira object that would potentially contain everything needed to add the missing tracking in redis-oplog. It looks like all the magic is happening in https://github.com/meteor/meteor-apm-agent/tree/master/lib/hijack so I guess I can figure out how the metrics are tracked by meteor-apm-agent but I'd really like to hear your thoughts on how and where to hook it with redis-oplog.

Thank you!

theodorDiaconu commented 5 years ago

did u try this branch? https://github.com/cult-of-coders/redis-oplog/pull/283 . It hooks into the core, and you benefit of all metrics.

simonbelanger commented 5 years ago

wow that was fast ⚡️😄 I'll try to test drive this today.

It looks like a major refactoring that has been pending for some time, is it production ready?

simonbelanger commented 5 years ago

so, good news...

All green ✅ in our e2e tests suite with the new branch
I now see metrics in the "Observer Info" tab

...and bad news...

still nothing in the "Documents & Updates" tab

etyp commented 5 years ago

@simonbelanger did you ever have luck with getting this working? If not, have you had a try at hacking it or any progress we could see/work off of?

simonbelanger commented 5 years ago

@etyp well not really... we never found the time to dig deeper, the task is still in our backlog... I still feel real love for these long lost metrics though

cult-of-coders / redis-oplog

Live Queries metrics in Meteor APM #201