PostHog / meta

This is a place to discuss non-product issues in public.
MIT License
18 stars 4 forks source link

Understanding Persons On Events (PoE) #173

Closed mariusandra closed 4 months ago

mariusandra commented 10 months ago

This should move to docs, putting my notes here now for quick access.

PostHog has two operating modes when you use person properties in your queries, such as when asking things like "filter by users whose email ends with @gmail.com".

  1. Mode "PoE disabled": Person and event data are kept in separate tables, and JOIN-ed when queried. This is slow, as we need to read and compare a lot of data. We always use the latest properties of a person when querying in this mode.

  2. Mode "PoE enabled": A cached snapshot of the person's properties is stored on the event. When querying, we read the data on the event without making a costly JOIN. The query matches the person's properties at the time of the event, not as they are now.

You can toggle between these modes under project settings:

image

Turning "PoE on" yields anywhere between 3x-10x improvements in query time, with larger datasets seeing the biggest wins.

However, you might need to update your code to be comaptible with "PoE".

How to send events with PoE on.

Problems arise if you have two types of users: anonymous (logged out) and signed in. You must make sure that the first event made by the signed in user contains a reference to the anonymous user.

If you're only sending events from the frontend, everything is handled for you, provided you call posthog.identify() as soon as you have the new ID of the user.

case = 'AD_web'

// posthog-js
posthog.capture(`${case}_anon`, '$pageview', {"lib": "web"})
posthog.capture(`${case}_anon`, 'other event', {"lib": "web"})
posthog.capture(`${case}_anon`, 'signup page', {"lib": "web"})

// frontend signup happens here, we get the new ID

// posthog-js
posthog.identify(`${case}_id`, {"lib": "web"})
posthog.capture(`${case}_id`, 'frontend signup', {"lib": "web"})
posthog.capture(`${case}_id`, '$pageview', {"lib": "web"})
image

If your flow demands a backend signup event, the flow above will fail

case = 'AD_not'

// posthog-js
posthog.capture(`${case}_anon`, '$pageview', {"lib": "web"})
posthog.capture(`${case}_anon`, 'other event', {"lib": "web"})
posthog.capture(`${case}_anon`, 'signup page', {"lib": "web"})

# in the python backend library
posthog.capture(f'{case}_id', 'backend signup', {"lib": "backend"})

// posthog-js
posthog.identify(`${case}_id`, {"lib": "web"})
posthog.capture(`${case}_id`, '$pageview', {"lib": "web"})
Screenshot 2024-01-11 at 13 49 31

The first event sent by {case}_id did not contain the anonymous user's ID, so we could not link the users. By the time we got the ID with the frontend identify event, the users were already created and could not be linked.

To get around this, pass the user's anonymous ID to your backend, and send a backend $identify event.

case = 'AD_ok'

// posthog-js
posthog.capture(`${case}_anon`, '$pageview', {"lib": "web"})
posthog.capture(`${case}_anon`, 'other event', {"lib": "web"})
posthog.capture(`${case}_anon`, 'signup page', {"lib": "web"})

# in the python backend library
posthog.capture(f'{case}_id', '$identify', {"$anon_distinct_id": f"{case}_anon", "lib": "backend"})
posthog.capture(f'{case}_id', 'backend signup', {"lib": "backend"})

// posthog-js
posthog.identify(`${case}_id`, {"lib": "web"})
posthog.capture(`${case}_id`, '$pageview', {"lib": "web"})
image

To get the anonymous user on the frontend, call

const anonDistinctId = posthog.get_distinct_id()

Then send this value to your backend, and submit an $identify even with it as the $anon_distinct_id property.

Note about querying PoE special fields

The following table might be helpful when debugging your events. These are all fields you can select on the events table:

Data Project setting From event Via join
Distinct ID distinct_id distinct_id distinct_id
Person ID person_id poe.id pdi.person_id
Person properties person.properties.foo poe.properties.foo pdi.person.properties.foo

What does not work with PoE enabled?

Currently the only thing that really doesn't work is tracking the anonymous part of returning signed up visitors.

User's visit 1: 5 anonymous pages + signup + signed in pages User's visit 2: 2 anonymous pages + login page + signed in pages

In this case, the 2 anonymous pages from visit 2 wouldn't be associated with the user. They'd remain "stuck" on the anonymous user.

Note about the future of PoE

We're working hard on removing the required workaround with passing the person's details to your backend, and also adding the ability to track the anonymous part of each recurring visit. Stay tuned!

tiina303 commented 10 months ago

This is great. Thanks for writing it up ❤️ Just a nit:

The first event sent by {case}_id did not contain the anonymous user's ID, so we could not link the users. By the time we got the ID with the frontend identify event, the users were already created and could not be linked.

the "could not be linked" could be a bit confusing ... not exactly sure what wording is best, but we can maybe just say from PoE perspective there are now two different users and e.g. funnels wouldn't combine them.

posthog.capture(f'{case}_id', '$identify', {"$anon_distinct_id": f"{case}_anon", "lib": "backend"}) Then send this value to your backend, and submit an $identify even with it as the $anon_distinct_id property.

Optional: in the backend we suggest folks use $create_alias events instead. Important here is that you want the id to be the new backend id (either alias or identify usage), so future events in the same session would go to the same kafka bucket and hence couldn't be processed before the alias event. sth like this: posthog.capture(f'{case}_id', '$create_alias', {"alias": f"{case}_anon", "lib": "backend"})

asteinlein commented 10 months ago

Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).

mariusandra commented 10 months ago

@asteinlein I really don't know, depends on what you're sending over and if it matches what's written above or not 🤷

corywatilo commented 9 months ago

Who'd like to write this up? This would be a great addition to the docs. =]

cc @PostHog/marketing (did I tag this right?)

ivanagas commented 9 months ago

I can do it 😄

joshforbes commented 9 months ago

Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).

If your flow is similar to mine, I don't think it will. Our flow is:

If I understand correctly, this is precisely the flow that will break PoE. To fix it, we would have to send the anon ID to the backend as part of the signup form and then use that in the segment server identify call. Though tbh... I have no idea how I would include "$anon_distinct_id" in the segment identify call in a way that posthog would use. 🤷‍♂️

asteinlein commented 9 months ago

Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).

If your flow is similar to mine, I don't think it will. Our flow is:

  • run segment js and posthog js on our marketing site
  • anon track all events before sign up
  • on sign up, submit a form to a backend running segment on the server
  • backend creates a user plus organization objects and calls segment.identify with the database user id
  • backend returns the database user id to the marketing site in the form response payload
  • marketing site uses segment js to call identify with the backend id

If I understand correctly, this is precisely the flow that will break PoE.

Indeed, that is exactly our use-case as well. And I would think that is a pretty common flow for users of PostHog + Segment?

To fix it, we would have to send the anon ID to the backend as part of the signup form and then use that in the segment server identify call. Though tbh... I have no idea how I would include "$anon_distinct_id" in the segment identify call in a way that posthog would use. 🤷‍♂️

I haven't been following along here in detail to be honest, but from afar it sounds strange why this couldn't work. When having a cookie/anon ID, and then subsequently identify it with a person-identified user ID, couldn't this be made to work? What makes this so special for PostHog compared to how Segment associates events with identified users in general?

joshforbes commented 9 months ago

Indeed, that is exactly our use-case as well. And I would think that is a pretty common flow for users of PostHog + Segment?

Yeah agreed. I'm fairly certain that this is the flow that Segment recommends.

This is just conjecture but my reading of the final part of the post makes me think that they aren't going to force the switch to PoE until they have this fixed:

Note about the future of PoE We're working hard on removing the required workaround with passing the person's details to your backend, and also adding the ability to track the anonymous part of each recurring visit. Stay tuned!

tiina303 commented 9 months ago

Just for FYI we're actively working on improving the way this works https://github.com/PostHog/posthog/issues/20460 which should ship by the end of Q1

The primary goal of this issue is that PoE query mode (in terms of unique users) will return exactly the same results as joins with the person & distinct_id tables

jclusso commented 8 months ago

@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?

MarconLP commented 8 months ago

@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?

The release has been delayed. The current plan is to ship this change in the next couple of weeks.

jclusso commented 7 months ago

@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?

The release has been delayed. The current plan is to ship this change in the next couple of weeks.

What about how past fired events? Will events fired in past be queryable by the person attributes like location that are set on initial identify?

tiina303 commented 7 months ago

What about how past fired events? Will events fired in past be queryable by the person attributes like location that are set on initial identify?

Yes, we have been writing person properties to events for a while and backfilled the time before. Just to clarify also this is for properties at the time of the event.

jclusso commented 7 months ago

Just to clarify also this is for properties at the time of the event.

So if an anonymous user enters your site and then they get identified, you'll be able to filter the identified events by that data from the anonymous user. (ex: initial country)

tiina303 commented 7 months ago

So if an anonymous user enters your site and then they get identified, you'll be able to filter the identified events by that data from the anonymous user. (ex: initial country)

Yes, assuming you have geoIP enabled (and using events with person processing - the default and only option until now), then we'd write the associated location data to the event. The note is more about the fact that if the user did that session in Germany and a later session in Austria, then the filtering would use Germany (i.e. at the time of the event), not Austria (i.e. current location value on the person object).

jclusso commented 7 months ago

@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?

The release has been delayed. The current plan is to ship this change in the next couple of weeks.

Is there any update on when this should be expected?

amacneil commented 7 months ago

The beta project setting has disappeared, does that mean POE is enabled by default now?

jclusso commented 7 months ago

The beta project setting has disappeared, does that mean POE is enabled by default now?

I hope not, because if so, it's not working.

tkaemming commented 7 months ago
  1. For Cloud, the setting was removed from the settings panel for now as we determine how to move forward with the rollout. Teams that were previously opted in via the user-controlled setting should have remained in the same state that they were prior to that change. If that didn't happen (or something else looks incorrect), reach out to us via support and we'll get things sorted out.
  2. For self-hosted, … yeah, this got broken accidentally. We'll fix that.
mariusandra commented 6 months ago

Update May 2024

We have now enabled the following section under "Project Settings" -> "Product Analytics"

image

This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.

jclusso commented 6 months ago

This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.

Any future plans to allow users to choose this option at the Insight level?

mariusandra commented 6 months ago

In some way you already can. Click "..." and "view source" from the top, then click the little "debug" link. The page that opens lets you specify this PoE setting on the insight level. Notice how changing it also changes the query.

Now you can just copy that back into the "view source" view and have the setting be applied per insight.

There's one caveat: we have a bug that prevents the view source dialog from saving. Once this is fixed, you should be able to set this per insight this way.

Whether we want to expose this in the UI or not is a different question 🤔

andyvan-ph commented 4 months ago

@ivanagas feel free to re-open, but I'm assuming this is stale for now.

NorfeldtKnowit commented 4 months ago

Update May 2024

We have now enabled the following section under "Project Settings" -> "Product Analytics"

image

This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.

@mariusandra That does not appear in our project settings?

MarconLP commented 4 months ago

Hey @NorfeldtKnowit, that option is unavailable for organizations created since June 2024. You can still access the PoE special fields from the post above.