PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.56k stars 1.22k forks source link

Expose "person" properties and person_id in plugins #10192

Open macobo opened 2 years ago

macobo commented 2 years ago

Is your feature request related to a problem?

Currently, person (and group) information is not exposed to plugins. There are several plugins which work around this by consistently hitting the API.

Describe the solution you'd like

Expose this information in the plugin events payload. This involves reworking the event pipeline (IMO in a good way).

Current state of event pipeline

The current event pipeline looks like this:

  1. Run plugin processEvent
  2. "Prepare" events for ingestion (We fetch/update/create persons/groups here)
  3. Potentially buffer events
  4. Emit events to kafka clickhouse queue
  5. Run async handlers (onEvent, onAction, ...)

Note that the step 2 is the "fattest" currently and is intermingled with event handling logic. Cleaning this up will clean up our logic somewhat.

We also do lookups in step 3 though we create persons in step 2, creating possibility for simplifications

Proposed event pipeline

  1. Create/update persons, groups and properties.
  2. Determine whether to buffer events
  3. Run plugin "processEvent"
  4. Determine person/group property updates from step (3), update accordingly.
  5. "prepare" events for ingestion
  6. Emit events to kafka clickhouse
  7. Run async handlers

We might not want to expose group information to plugins themselves in the first stage and only expose person information. This is a product decision which might simplify refactoring in the future.

Note that we pass forward person objects from one step to the next here.

Problems and workarounds

Additional context

cc @yakkomajuri @tiina303 @Twixes

I think this is both a win for our users as well as cleans up the plugin-server/src/worker/ingestion/process-event.ts file which is doing perhaps too much.

It also is something that's easier to implement before the buffer is live - changing the payloads afterwards will cause trouble and extra debugging.

Thank you for your feature request – we love each and every one!

Twixes commented 2 years ago

This makes sense to me, my question though is, don't we plan to include persons on events as part of… persons-on-events anyway? In which case the rearranging of the pipeline still makes sense, but it would come more naturally.

macobo commented 2 years ago

The two projects are independent as it's the "2. "Prepare" events for ingestion (We fetch/update/create persons/groups here)" that is enriching events with person information currently. Note this happens after we run the plugins.

tiina303 commented 2 years ago

Yakko and I have discussed the buffer part a bit and here's the related comment: https://github.com/PostHog/posthog/issues/9182#issuecomment-1134516822 so combining that with the proposal here the pipeline in the future could be:

  1. Determine whether to buffer events (& send to buffer / get from buffer or ingest directly)
  2. Create/update persons, groups and properties.
  3. Run plugin "processEvent"
  4. Determine person/group property updates from step (3), update accordingly.
  5. "prepare" events for ingestion
  6. Emit events to kafka clickhouse
  7. Run async handlers
macobo commented 2 years ago

Once https://github.com/PostHog/posthog/pull/10360 lands there's not much left to do:

  1. Revert this commit: Undo radical change to event pipeline - will re-add it later!.
  2. Figure out a data format for exposing data to plugins with
  3. Documentation, consumer.io plugin, etc....

Note we should wait for https://github.com/PostHog/posthog/pull/10355/files before that change.

camerondeleone commented 1 year ago

Note that this becomes higher priority once we start rate limiting the api per #12015 because currently several teams use calls to /persons to sync person data to a warehouse/other destination. Once we have this info in the plugin server, the export plugins can be updated to pass person info along. There are at least two teams I know of (discussed here ) that this will affect, but probably more.