Closed mariusandra closed 4 months ago
This is great. Thanks for writing it up ❤️ Just a nit:
The first event sent by {case}_id did not contain the anonymous user's ID, so we could not link the users. By the time we got the ID with the frontend identify event, the users were already created and could not be linked.
the "could not be linked" could be a bit confusing ... not exactly sure what wording is best, but we can maybe just say from PoE perspective there are now two different users and e.g. funnels wouldn't combine them.
posthog.capture(f'{case}_id', '$identify', {"$anon_distinct_id": f"{case}_anon", "lib": "backend"}) Then send this value to your backend, and submit an $identify even with it as the $anon_distinct_id property.
Optional: in the backend we suggest folks use $create_alias
events instead. Important here is that you want the id to be the new backend id (either alias or identify usage), so future events in the same session would go to the same kafka bucket and hence couldn't be processed before the alias event. sth like this: posthog.capture(f'{case}_id', '$create_alias', {"alias": f"{case}_anon", "lib": "backend"})
Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).
@asteinlein I really don't know, depends on what you're sending over and if it matches what's written above or not 🤷
Who'd like to write this up? This would be a great addition to the docs. =]
cc @PostHog/marketing (did I tag this right?)
I can do it 😄
Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).
If your flow is similar to mine, I don't think it will. Our flow is:
If I understand correctly, this is precisely the flow that will break PoE. To fix it, we would have to send the anon ID to the backend as part of the signup form and then use that in the segment server identify call. Though tbh... I have no idea how I would include "$anon_distinct_id" in the segment identify call in a way that posthog would use. 🤷♂️
Will this Just Work™ when using the Segment integration? We're using their JS SDK in the frontend, and their Python lib in the backend (where we send user ID with every event track call).
If your flow is similar to mine, I don't think it will. Our flow is:
- run segment js and posthog js on our marketing site
- anon track all events before sign up
- on sign up, submit a form to a backend running segment on the server
- backend creates a user plus organization objects and calls segment.identify with the database user id
- backend returns the database user id to the marketing site in the form response payload
- marketing site uses segment js to call identify with the backend id
If I understand correctly, this is precisely the flow that will break PoE.
Indeed, that is exactly our use-case as well. And I would think that is a pretty common flow for users of PostHog + Segment?
To fix it, we would have to send the anon ID to the backend as part of the signup form and then use that in the segment server identify call. Though tbh... I have no idea how I would include "$anon_distinct_id" in the segment identify call in a way that posthog would use. 🤷♂️
I haven't been following along here in detail to be honest, but from afar it sounds strange why this couldn't work. When having a cookie/anon ID, and then subsequently identify it with a person-identified user ID, couldn't this be made to work? What makes this so special for PostHog compared to how Segment associates events with identified users in general?
Indeed, that is exactly our use-case as well. And I would think that is a pretty common flow for users of PostHog + Segment?
Yeah agreed. I'm fairly certain that this is the flow that Segment recommends.
This is just conjecture but my reading of the final part of the post makes me think that they aren't going to force the switch to PoE until they have this fixed:
Note about the future of PoE We're working hard on removing the required workaround with passing the person's details to your backend, and also adding the ability to track the anonymous part of each recurring visit. Stay tuned!
Just for FYI we're actively working on improving the way this works https://github.com/PostHog/posthog/issues/20460 which should ship by the end of Q1
The primary goal of this issue is that PoE query mode (in terms of unique users) will return exactly the same results as joins with the person & distinct_id tables
@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?
@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?
The release has been delayed. The current plan is to ship this change in the next couple of weeks.
@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?
The release has been delayed. The current plan is to ship this change in the next couple of weeks.
What about how past fired events? Will events fired in past be queryable by the person attributes like location that are set on initial identify?
What about how past fired events? Will events fired in past be queryable by the person attributes like location that are set on initial identify?
Yes, we have been writing person properties to events for a while and backfilled the time before. Just to clarify also this is for properties at the time of the event.
Just to clarify also this is for properties at the time of the event.
So if an anonymous user enters your site and then they get identified, you'll be able to filter the identified events by that data from the anonymous user. (ex: initial country)
So if an anonymous user enters your site and then they get identified, you'll be able to filter the identified events by that data from the anonymous user. (ex: initial country)
Yes, assuming you have geoIP enabled (and using events with person processing - the default and only option until now), then we'd write the associated location data to the event. The note is more about the fact that if the user did that session in Germany and a later session in Austria, then the filtering would use Germany (i.e. at the time of the event), not Austria (i.e. current location value on the person object).
@tiina303 is this stil on track for end of Q1? Also, will past fired events be fixed?
The release has been delayed. The current plan is to ship this change in the next couple of weeks.
Is there any update on when this should be expected?
The beta project setting has disappeared, does that mean POE is enabled by default now?
The beta project setting has disappeared, does that mean POE is enabled by default now?
I hope not, because if so, it's not working.
Update May 2024
We have now enabled the following section under "Project Settings" -> "Product Analytics"
This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.
This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.
Any future plans to allow users to choose this option at the Insight level?
In some way you already can. Click "..." and "view source" from the top, then click the little "debug" link. The page that opens lets you specify this PoE setting on the insight level. Notice how changing it also changes the query.
Now you can just copy that back into the "view source" view and have the setting be applied per insight.
There's one caveat: we have a bug that prevents the view source dialog from saving. Once this is fixed, you should be able to set this per insight this way.
Whether we want to expose this in the UI or not is a different question 🤔
@ivanagas feel free to re-open, but I'm assuming this is stale for now.
Update May 2024
We have now enabled the following section under "Project Settings" -> "Product Analytics"
This lets you choose whether you want person properties to be ingestion-time (faster) or current (slower), and whether you care about merged users (anon -> identified) being distinct or not.
@mariusandra That does not appear in our project settings?
Hey @NorfeldtKnowit, that option is unavailable for organizations created since June 2024. You can still access the PoE special fields from the post above.
This should move to docs, putting my notes here now for quick access.
PostHog has two operating modes when you use person properties in your queries, such as when asking things like "filter by users whose email ends with @gmail.com".
Mode "PoE disabled": Person and event data are kept in separate tables, and JOIN-ed when queried. This is slow, as we need to read and compare a lot of data. We always use the latest properties of a person when querying in this mode.
Mode "PoE enabled": A cached snapshot of the person's properties is stored on the event. When querying, we read the data on the event without making a costly JOIN. The query matches the person's properties at the time of the event, not as they are now.
You can toggle between these modes under project settings:
Turning "PoE on" yields anywhere between 3x-10x improvements in query time, with larger datasets seeing the biggest wins.
However, you might need to update your code to be comaptible with "PoE".
How to send events with PoE on.
Problems arise if you have two types of users: anonymous (logged out) and signed in. You must make sure that the first event made by the signed in user contains a reference to the anonymous user.
If you're only sending events from the frontend, everything is handled for you, provided you call
posthog.identify()
as soon as you have the new ID of the user.If your flow demands a backend signup event, the flow above will fail
The first event sent by
{case}_id
did not contain the anonymous user's ID, so we could not link the users. By the time we got the ID with the frontendidentify
event, the users were already created and could not be linked.To get around this, pass the user's anonymous ID to your backend, and send a backend
$identify
event.To get the anonymous user on the frontend, call
Then send this value to your backend, and submit an
$identify
even with it as the$anon_distinct_id
property.Note about querying PoE special fields
The following table might be helpful when debugging your events. These are all fields you can select on the
events
table:distinct_id
distinct_id
distinct_id
person_id
poe.id
pdi.person_id
person.properties.foo
poe.properties.foo
pdi.person.properties.foo
What does not work with PoE enabled?
Currently the only thing that really doesn't work is tracking the anonymous part of returning signed up visitors.
User's visit 1: 5 anonymous pages + signup + signed in pages User's visit 2: 2 anonymous pages + login page + signed in pages
In this case, the 2 anonymous pages from visit 2 wouldn't be associated with the user. They'd remain "stuck" on the anonymous user.
Note about the future of PoE
We're working hard on removing the required workaround with passing the person's details to your backend, and also adding the ability to track the anonymous part of each recurring visit. Stay tuned!