PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
21.64k stars 1.29k forks source link

Metrics on event deliverability #4577

Closed yakkomajuri closed 3 years ago

yakkomajuri commented 3 years ago

Tasks


Requested by a potential user on a call.

If I'm exporting events somewhere else, I'd like some metrics as to what percentage of my events actually get delivered.

Segment provides this:

They infer success from the lack of an error being thrown.

One step towards "UI plugins" that could help here is allow plugin devs to set metrics that would appear on a dashboard. A native API like this, for example:

const eventsDelivered = await storage.get('eventsDelivered')
const eventsProcessed = await storage.get('eventsProcessed')
updateMetric('percentageDelivered', (eventsDelivered/eventsProcessed)*100)

This way we don't have to deal with the problem of rendering arbitrary code yet, but give plugin devs the option to show some more stuff on the UI. Just a quick idea though.

yakkomajuri commented 3 years ago

Simple alternative is to leave this completely up to plugin devs to use events for metrics.

This could be done now and implemented into the existing plugins.

e.g. a job to run every hour and capture an event with some metrics

Twixes commented 3 years ago

This is a great idea. I'm also pretty sure we will take significant inspiration from Segment. I think this should be a core part of PostHog and placed in the Events page (along Events, Actions etc.) somewhere (but open to other options). Maybe also System Status page where ALL of the instance would be considered (instead of only the current project). The data should ideally be stored as a Postgres/ClickHouse table of delivery events. That way we'd have high data integrity plus useful metadata for graphs. I'll move this to posthog as the bulk of the work here is to be done in the main repo (database schema + UI)

macobo commented 3 years ago

Perhaps this fits well with https://github.com/PostHog/posthog/issues/2965? Users who don't use features like this should not see anything :)

yakkomajuri commented 3 years ago

Just realized that for the new exportEvents function we can probably derive metrics without dev work

mariusandra commented 3 years ago

Yup, that's exactly the point of exportEvents: holding the hand of users and telling them what's happening with regards to exports, retries, etc.

As I imagine you'd like to see per plugin how it has been performing, it makes sense to put these graphs on the plugins page itself. We might need a proper "plugin page" (/project/plugins/123) first, as the drawers are already pretty crammed with things.

yakkomajuri commented 3 years ago

Ok, here's the current plan for implementing this:

Phases

Phase 1: Metrics for exportEvents by default Phase 2: General metrics API for any plugin Phase 3: Displaying metrics on the frontend

Approach

While Phase 1 could be tackled in a very non-scalable way, it's worth approaching it in a way that scales to Phase 2. The method I currently have in mind is:

  1. Plugin authors specify an export called metrics which is a string array of metric names they want their plugin to track (for exportEvents this is done automatically)
  2. When inferring plugin capabilities, we also populate a column metric_names in the plugin table based on this export
  3. Plugin authors can do increment and decrement operations on a metric (this can be expanded later, and is also done by default for exportEvents)
  4. Metric operations are aggregated and sent e.g. once an hour via a PostHog event with a reference to the plugin and the metrics
  5. When rendering metrics in the frontend, we consult what metrics a plugin has specified, and render one graph per plugin with all the specified metrics. Every metric will be a sum of property for the appropriate plugin metrics event.

Discussion

The idea for adding the metric_names column is to make things a lot easier when finding what properties to add to a graph. It should be very little added storage (an array with ~3 items per plugin installed).

We could also limit the number of metrics a plugin can handle (2? 5?).

Thoughts, suggestions?

Twixes commented 3 years ago

In the name of poking the idea, could you give an example of metric_names values that plugin rows would have? :D

Specifically interested in what types of metrics could be measured and presented this way. For instance Segment has: delivered on first try, delivered on retry, not delivered due to failure. But also they aggregate all failures (regardless of whether the event ended up being delivered successfully on retry or not) separately to allow diagnosing the causes of failures. Would we be able to do both in this approach, or otherwise on which one would we focus first?

I think it's also worth figuring out how this would be incremented in a way allowing aggregation by time, because a row per increment would be potentially a lot of data, but with other approaches we need to be considerate to not end up with incorrect numbers.

neilkakkar commented 3 years ago

Definitely agree with the idea behind the approach- make things so we can extend to phase 2 reasonably easily. I'd also like to make it easy to aggregate different kind of metrics without changing our architecture / data collection schemas.

Just trying to think this through:


Assuming we want decent capability here, it's worth following a statsd-like interface ( https://github.com/statsd/statsd ) - if not using statsd directly for backend aggregation. So we expose an interface like:

// Timing: sends a timing command with the specified milliseconds
  client.timing('response_time', 42);

  // Increment: Increments a stat by a value (default is 1)
  client.increment('my_counter');

  // Decrement: Decrements a stat by a value (default is -1)
  client.decrement('my_counter');

  // Histogram: send data for histogram stat
  client.histogram('my_histogram', 42);

  // Gauge: Gauge a stat by a specified amount
  client.gauge('my_gauge', 123.45);

which is used inside the plugin to increment whatever metric we want. Hmm, at this point, we actually don't need anything like metric_name to tell us what graphs to show? Since what data we have governs what metrics we have?

We could even use a visualisation library, like graphite - or take inspiration from it, given we'd have the same backend schemas+data thanks to statsd, to create our own visualisations.


In short, (1) I don't yet see a good reason for building this from scratch ourselves. (2) If none of the metric libraries need us to declare the metric names before hand, maybe we don't need to either?

yakkomajuri commented 3 years ago

I think they key point here is that I'd want to use PostHog for this. Maybe I didn't make it clear. The instance/status dashboards that Karl built are what I'd want to get to - no building anything from scratch, not adding another lib, 100% leveraging what we already have.

The new column would be just for knowing what props to access on events. For exportEvents, this would be something like: ['successful-deliveries', 'retry-errors', 'other-errors']. Other plugins may want to implement other similar metrics, to specify things like failures pulling data, or failures parsing data. They could also do processing time metrics. Ultimately, if we want to get to a point where we have a bunch of third-party devs building plugins, they may want to show their users how good their plugin is.

Definitely don't want this to be fully-fledged either. Some minor metrics functionality should be more than enough. The idea here is to just have the structure in place to potentially expose a metrics API for everyone, but we might as well just keep it in only for exportEvents if we feel that's best.

@neilkakkar @Twixes

yakkomajuri commented 3 years ago

And for @neilkakkar's second point, the only reason for declaring them beforehand is because I want to use PostHog events, and I want to easily know what props in them to access and plot in PostHog graphs.

neilkakkar commented 3 years ago

it sounds like we're thinking of different things. Can you give me an example please of how such a plugin would look like?

From the sound of it, we're adding these metrics as properties to the event itself? How does that work with say, retries, and failures?

Oh, and then, the graph is basically just a trend for that event property? Ha, nice!

In this case, (assuming I'm now understanding correctly), the metric_name sounds like a configuration option. I have events which have these properties, and right now I want to see a trend of these specific properties. This sounds more like a config option with some defaults than a new property of the plugin?

yakkomajuri commented 3 years ago

Let me try again.

Imagine I have 2 plugins: "S3 Export Plugin" and "Schema Enforcer Plugin".

S3 Export Plugin wants to tell its users how many events it has seen, and how many of those it has successfully exported. It specifies:

export const metrics = ['events-seen', 'successful-deliveries']

Schema Enforcer wants to tell users how many events it received followed the right schema, and how many did not. It specifies:

export const metrics = ['events-seen', 'events-with-right-schema']

When events come in, the plugins increment the metrics appropriately. What happens "under the hood" is that these are kept in memory by the plugin server, under some PluginMetricsCache class or something.

At a regular interval (e.g. every hour), a scheduled job sends an event per plugin with the current metric values. e.g. :

{
    event: 's3-export-plugin-metrics',
    properties: {
       'events-seen': 560,
       'successful-deliveries': 420
    }
}

We then use PostHog graphs to display a Trend graph with the following specs:

Graph: Line graph, over 30 days, shown daily Graph Series A: SUM of property events-seen on event s3-export-plugin-metrics Graph Series B: SUM of property successful-deliveries on event s3-export-plugin-metrics


For exportEvents the plugin dev would never configure any of this (e.g. S3), but if the Schema Enforcer dev (me!) wanted to build its own metrics, they have an API to do so.

It uses PostHog events, and PostHog graphs, so there's no new stuff here to store or display graphs. We just need a layer to enable metrics capturing. And if you don't have metric_names, deriving the specific metrics to add to a graph would be more difficult.

Hence it's a new property of the plugin, as it isn't configurable by the user and doesn't relate to an "instance" of a plugin (pluginConfig), it only relates to the plugin itself - the metrics will be the same for everyone, just like capabilities.

neilkakkar commented 3 years ago

Okay, this makes a lot more sense now, thanks! (Was it clear to everyone else from the beginning?)

In this case it sounds like we have a default operation: increment, and we're basically aggregating number of events happening in say, an hour, and exposing just the special names of things we can count. This includes, say, events-seen, successful-deliveries, and retries.

Then, there are custom properties that devs might want to expose, which they can increment per event using the updateMetric call?


(Sorry for so many questions about this lol, I'm just trying to get a better understanding of what we're doing here).

So, now, I have an event coming in every hour like:

{
    event: 's3-export-plugin-metrics',
    properties: {
       'events-seen': 560,
       'successful-deliveries': 420,
       'my-custom-s3-metric': 12345,
    }
}

Displaying the plugin graph is multiple trendlines, one for each property (by default)? And they can pick and choose from these properties to decide what to show and what not to show.

Final Question (I hope): Aren't properties exactly what metric_names should be? Assuming we're generating the event name ourselves, given a plugin, given an event name, we don't need any extra data to display the different trend lines?

yakkomajuri commented 3 years ago

No worries at all! Thanks for asking all these questions - forces me to spec this out better and challenge the approach.

As to your question:

Well, consider this - we don't have a list of properties that exist for a given event name. You can see that when you select a math operation, we will suggest to you every numerical property for your team, not the numerical props for the event you selected.

So how could we decide what lines to show on a graph? How do I know what property names to use? If a plugin doesn't specify their metrics in some way, how do we know what to display? We can also derive them from the increment or whatever calls without an export, but would still have to store them somewhere.

Even if it was guaranteed (it isn't) that an event would always only have the same properties (which would be just the metrics), we'd still need a bit of a workaround to infer them, so the easiest is to just go ahead and store their names upfront.

yakkomajuri commented 3 years ago

I do think the point at which we infer and store metrics can be discussed though. We wouldn't need people to export metrics and could just store the names when we get a metrics.increment or metrics.update call.

However, it then becomes messier to limit the number of metrics for example, as we would have to cut a plugin off mid-execution once we encounter the n-th metric. With a deliberate export, we can limit the number when setting up the plugin, and make sure any call with a unspecified metric is dropped.

neilkakkar commented 3 years ago

What's the idea behind restricting number of metrics?

yakkomajuri commented 3 years ago

Doesn't need to happen. But it's good to know we could, just as we have timeouts, maybe we don't want a loop with a million iterations updating a randomly-generated metric.

yakkomajuri commented 3 years ago

With the goal of running arbitrary code on our servers, the more we have the potential to add guards to things, the better

neilkakkar commented 3 years ago

Just making this explicit / ensuring I got this right: This approach restricts us quite a bit in what we can do with metrics. (which may be fine, if we're only looking for basic stats). What might be a bit iffy is how these metrics aren't connected to anything else? Like, I can't know what all failed/succeeded, given I have N number of failures? But then, that's what the plugin logs are for, so it's fine as long as we're logging those failures too.

We probably can't aggregate over all failed events, but that's again something we probably don't want to do anyway. And there's $plugins_failed for more advanced aggregation here.

neilkakkar commented 3 years ago

I like the idea of inferring plugin metrics more: can't we infer these at the final step? Say, you load a bunch of events with the specific name, and then read the properties on the frontend to generate the graphs? (not sure if this is possible, but should be, given we have all the data on the frontend loaded?)

Basically I don't want to get users to export metric names, then, say, forget some, make typos, etc. and see their metrics not populating.

And more broadly, I don't want to store metric names anywhere at all. It seems to me that it constrains things more than it should, and doesn't sit right with the models that we have so far? (Users generate events, filter on props, get their graphs). Maybe what's new / exciting here is something like "smart prefilling", or how we restrict properties to the given event and prepopulate it by default.

Regarding issues with cutting these off: as long as we're in control of storing metrics data ourselves, we can always drop them over a certain limit, if we wanted to? We don't have to cut-off plugin execution, just not input the metric if it's over a certain count. metric_names just for limiting # of metrics doesn't seem right. Valid for the other reasons though!

neilkakkar commented 3 years ago

Again, this might be impossible to do now (I don't think so, but maybe I don't know enough). In which case, perfectly happy to go with the existing approach!

yakkomajuri commented 3 years ago

Doing everything on the frontend is possible, but I don't like how hacky it is. Also don't see what the big deal is looking at metrics like capabilities, since they wouldn't vary much, nor change per installation.

Regarding how this would be done in the frontend, we'd need an event name that can be inferred from the plugin name (easy), but then its properties would also need to be inferrable in general, with disregard for the event name (since we don't know all the props an event has). So we would need to prepend prop names with the plugin identifier, filter numerical props by this identifier, and build all of this out in the frontend, which would be slower, hackier, and less readable. I wouldn't want to set up 20 plugin graphs entirely on the frontend. Also because we'd need to check all installed plugins, whereas with metric_names we know what plugins have metrics and what don't.

If you take a look at how the dashboards on /instance/status are done, you'll see that the heavy lifting is on the backend. The one difference is they have static prop names.

yakkomajuri commented 3 years ago

I'd like to move forward with this unless there's a more neat frontend solution in sight.

If you're worried about typos or something, one thing that could be done is build a jobs-like API:

metrics.myMetric.increment
metrics.myMetric.update
yakkomajuri commented 3 years ago

Another thing I guess is plugin versions, we'd probably want to segregate metrics by version somehow?

yakkomajuri commented 3 years ago

And to be fair, typos will be picked up during development with an error, so not sure that's a big deal

yakkomajuri commented 3 years ago

Another solution is to have them defined in plugin.json. Might be more intuitive, and they would operate just like config fields (you also need to keep those names in sync). Then we don't need metric_names. I guess then making exportEvents do this automatically becomes harder though

yakkomajuri commented 3 years ago

I think I've come to a decision. Will implement and check back.

neilkakkar commented 3 years ago

The new(ish) direction makes a lot of sense, and sounds promising to me! It makes sense that we'll need to store the default aggregation operation for metrics, and that goes hand-in-hand with the metric name. Good work! :)

About versioning, soon(ish?), we'll have something similar for archive/sources with the install step, and I think it's safe to defer this for metrics to that point in time as well. (would affect capabilities, and metrics are closely related to that, data-flow wise)

yakkomajuri commented 3 years ago

Oh versioning is essentially handled too (to some extent) - I'm passing the plugin tag along with the event so that should allow graphs to filter events by the tag the installed plugin is on. Doesn't work for plugins without a tag though.

Twixes commented 3 years ago

I think this can be considered done