PostHog / plugin-server

Service to process and save PostHog events, supporting JS/TS plugins at that
https://posthog.com
8 stars 5 forks source link

Remove support for $increment #603

Closed yakkomajuri closed 2 years ago

yakkomajuri commented 2 years ago

Changes

$increment makes #371 a bit harder.

We discussed this at the platform standup and decided to just get rid of this feature for now. Some analysis on events appears to indicate that this is rarely used if at all. This is expected as it was never documented.

This was a feature I originally built as a way of getting familiar with the plugin server. It's nice, but we can revisit when we get a request for it.

Closes #602

Checklist

neilkakkar commented 2 years ago

(Not blocking). For the rest of us, would be nice to hear a short summary of the discussion!

Specifically, about how $increment makes ingesting backwards harder, and are we deciding to remove it because 371 gets harder, or because it's not being used - despite some people having asked for it? :)

yakkomajuri commented 2 years ago

Yes, thanks for asking. It's important to give visibility.

So $increment was something that I wanted to build, so when I joined the Extensibility team I used some of my off time to build it as a way of familiarizing myself with the codebase. However, I didn't take it forward, never documenting it and taking a long time to add support for it in posthog-js.

As a result, it seems nobody uses it. I am not 100% certain of this, but: We don't have mat_$increment on the events table, and I've timed out queries looking for an event with $increment. I couldn't find one by sampling either.

That alone could be a reason for dropping it, but it also makes property updates more complicated. Essentially, increment should aways run independently of timestamp, whereas set and set_once will become timestamp dependent. The other problem is that the current implementation assumes the user will only increment a given property, and mixing set/set_once with increment is undefined behavior.

To account for timestamps + mixing it with set/set_once we'd need to add some more complex handling that currently isn't justified by its usage. However, the idea here is to revisit this as soon as someone asks for it.

The good news is that the work I did on increment is the seed of #371 , as you can see by looking at https://github.com/PostHog/posthog/pull/6475 :D

tiina303 commented 2 years ago

Just to expand on the explanation for why increment is complicated:

if we allow it to be mixed with set/set_once

TLDR: this is way too complex and inc use cases don't require mixing.

The way this in theory is logical to work is we look at increment timestamps and after set/set_once apply the increments from that point forward. Now imagine

inc(a: 1, ts = 1)
inc(a: 20, ts = 2
set(a: 300, ts = 3)
inc(a: 4000, ts = 4)
inc(a: 50000, ts = 5)

we'd want to get a = 54300 (ignoring the increments before the initial set) but if we process it in this order (lots of reasons why we can run into this, e.g. network routing differences/resending):

inc(a: 20, ts = 2)
inc(a: 4000, ts = 4)
set(a: 300, ts = 3)
inc(a: 1, ts = 1)
inc(a: 50000, ts = 5)

When we see set in order to know that we need to apply 4000 inc and not apply 20 inc we need to know all previous increments and their timestamps (we can optimize this in a way that we keep inc sums from the beginning of time, which means we don't need to look through all of them and sum up, but that makes inc ingestions a bit more work and still requires us to keep all timestamps). For the inc after set we can just compare to the timestamp of the set call, so that's not that tricky. It's even more complex with set_once as that can move backwards in time (if we later see a set_once with earlier timestamp we need to use that instead).

However if you think about the use case for why someone would want to use increment we don't really need it to be mixed with set/set_once. The initial value is 0 and then inc can be used to set it to whatever we need initially.

if we don't allow it to be mixed

Updates become simple as we'd just ignore timestamps. But ...