PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
21.54k stars 1.29k forks source link

Implementing Person-on-Events #9802

Open neilkakkar opened 2 years ago

neilkakkar commented 2 years ago

This issue is managing implementation work around person-on-events.

Broad things to keep in mind:

  1. We use an instance setting to decide when to use person-on-event queries.
  2. We're going piecemeal: Get a vertical slice on prod, then continue on the rest. Trends is first.
  3. All tests using journeys_for, or flush_person_and_events() now automatically get person properties on events. But, this doesn't matter unless we override config to use the new queries we write.
  4. Also note: The automated CI for person-on-events disregards snapshots, since these would obviously be different. Make sure there's at least one new test to confirm that the new snapshots are correct.
  5. Aliasing can be tricky, since there's no pdi.person_id anymore, but e.person_id instead. Remember to handle these.

... and the relevant tasks to attack: (Put an @ next to the task you pick up - they're in priority order :) )

  1. [x] Setup CI to run all tests using the new instance setting: This ensures all old tests work with our changes. @neilkakkar
  2. [x] Trends person-on-events querying: @neilkakkar
  3. [x] Funnels querying - @EDsCODE
  4. [x] Retention querying - @neilkakkar
  5. [x] Lifecycle querying - @neilkakkar
  6. [x] Correlation Analysis querying (depends on funnels^) - @neilkakkar
  7. [x] Paths querying (funnel connection depends on funnels^)
  8. [x] Stickyness querying
  9. [x] Cohorts querying
  10. [x] ???

The migration plan

Team ingestion set up a test team (team_id 7964) with only person-on-events. We use this for all initial testing, ensuring things work, and then move on to enabling it for team 2 when the backfill is done, which will take a bit longer.

Each piece of work should be self-contained, such that enabling this setting doesn't bork other things which haven't yet been switched on. The new CI tests should automatically ensure this.


Clean up

Once this migration is over, there's a lot of things in flux that we should clean up (and think about self-hosted users upgrade path & ramifications). List of things to remove:

  1. [ ] Aggregation by distinct IDs
  2. [ ] Person property pushdowns
  3. [ ] Person and distinctID joins
  4. [ ] Group joins
  5. [ ] Remove tests marked with TODO: Delete after
  6. [ ] Switch the default CI tests over to person-on-events
  7. [ ] (maybe?) remove using_person_on_events parameter

Problems to address

This section is for things that came up during implementation that we need a broader discussion for

  1. [x] Funnel Property Filters and Breakdowns can get borked when using point-in-time person properties for funnels where the first (or last) event doesn't have that property (ex: pre signup)
  2. [x] Lifecycle query depends on person.created_at column, which isn't yet exposed on the events table. If this query ever supports groups, the same would apply to $group_X.created_at
Twixes commented 1 year ago

Anything remaining from this list? 👀

neilkakkar commented 1 year ago

Yes, all the cleanup