PostHog / first-time-event-tracker

Track if a given event is the first event of this type for a user.
MIT License
0 stars 2 forks source link

Should this be built into core posthog instead? #1

Open kpthatsme opened 3 years ago

kpthatsme commented 3 years ago

I don't know if this functionality should belong in a plugin, I think this belongs in core PostHog.

yakkomajuri commented 3 years ago
  1. It might be useful for stuff like funnels, webhooks, as well as easier filtering.
  2. I think so - we could as a start at least suggest enabling this during onboarding?

@kpthatsme

also sorry for not replying earlier

Twixes commented 3 years ago

I think this wouldn't actually be more efficient as part of PostHog core (would probably end up as a field on the Team model, making it even larger), so makes sense as a plugin to me. Definitely could be a good recommended plugin at the same time, shown during onboarding, possibly even enabled by default. And plugins during onboarding is something we plan to do this sprint. WDYT @kpthatsme?

yakkomajuri commented 3 years ago

I definitely think we should at least push for this during onboarding, because, as the README says:

This plugin will only work on events ingested after the plugin was enabled. This means it will register events as being the first if there were events that occured before it was enabled. To mitigate this, you could consider renaming the relevant events and creating an action that matches both the old event name and the new one.

kpthatsme commented 3 years ago

I appreciate your thoughts @yakkomajuri and @Twixes!

I think this wouldn't actually be more efficient as part of PostHog core (would probably end up as a field on the Team model, making it even larger), so makes sense as a plugin to me. Definitely could be a good recommended plugin at the same time, shown during onboarding, possibly even enabled by default. And plugins during onboarding is something we plan to do this sprint. WDYT @kpthatsme?

(I should clarify I was approaching this from a user's perspective and not from an implementation perspective.)

So I think enabling this plugin by default and by default adding this property for every event i.e. a blacklist instead of a whitelist would solve the problem.

The main things guiding this for me: 1) First time analysis is critical – you want it more often than not and people aren't going to know before hand to go and update the config before their event ships, at some point it's going to cause issues. 2) Any data trust issues kinda ruins the whole set, you could go and create a custom event or do the suggested workaround, but that just adds more messy data (in the form of Actions that are just there to do this work around) which I think makes it harder to use PH generally. So if we can avoid things like that make this as low level and part of the system as possible the better (whether plugins or in core) I think we create a better experience.

If this can be done in a reliable way (i.e. there's no chance the plugin server overwrites first time dates) I actually love this idea as a first plugin for users as its a great way to show the power of plugins with some powerful + easy to understand functionality.

mariusandra commented 3 years ago

I think we can add a few things to the plugin server to solve the "first time tracked only from this moment" problem. We can expose a fetchUser(distinct_id) function into the plugins. If no user is found, it's the first event.

After the plugins run, more often than not we need to fetch this user from postgres anyway, so doing it a few steps earlier (and caching for later) might not event affect performance all that much.

Twixes commented 3 years ago

That'd be pretty flexible @mariusandra, but arbitrary fetchUser() could be pretty taxing (easy way to allow plugin devs to query the users table a ton). Perhaps this would just be best as part of ingestion core? Most importantly though, to inform design of this, how would we use the data which particular event was the first time this event type was seen for a particular user in analytics? I'm kind of struggling to get a feel for this

mariusandra commented 3 years ago

I think eventually we'll anyway add meta.api.fetchUser along other API methods. Not explicitly for use during event ingestion, but mainly for other scheduled/background tasks. Ideally the API would be autogenerated from a schema, and used as well by the main react app. Nobody asked for it yet, so let's punt for as long as we can :).

In the ingestion step, we can really easily add a new "first_event_for_this_distinct_id" property to an event at the moment we create a new person in the database. However, past events can't be altered and if users merge a few distinct_ids, they will have multiple events with this property. For example every time you open incognito and log in to your site, for a brief moment you'll be a new user before it gets merged into your old user. @kpthatsme Would multiple "first event for user" tags per user cause a lot of problems? Currently we just can't avoid them.

kpthatsme commented 3 years ago

@mariusandra I think it's acceptable for anonymous events / events that we can't merge a user for, but when we merge users we should pick the earliest. So in your case if I open two incognito windows - Anon ID A and Anon ID B – I do the same event under A and B. Subsequently the users get merged, we should take the first timestamp, across ID A and B as the first time the event occurred.

@Twixes to answer your question – this kind of data is particularly helpful in digging into key conversion areas. For example, in our product, we'd expect someone running their 20th insight analysis to have a pretty different experience than someone who had just run their first. Or from a marketing perspective, maybe you want to look at behaviors of repeat content consumers versus newcomers.

(I want to make sure what's clear though is my initial ticket here wasn't a suggestion that this needs to be prioritized now but a general question on whether this belongs in our ingestion pipeline instead of a plugin.)

Twixes commented 3 years ago

Hm, I wasn't super clear, sorry. I meant that I definitely see why this feature is useful for analytics, but for this ticket specifically I wonder how would a potential UX of using this data look like in PostHog. As in, does one just add is_first_event_ever = true to filters or something else?

mariusandra commented 3 years ago

@kpthatsme "Subsequently the users get merged, we should take the first timestamp, across ID A and B as the first time the event occurred.", who is "we" in this sentence? In the backend in this case the user will have 2 or 3 different events with this property set for the user and there will be no way to avoid that, as clickhouse is pretty much append-only. Will that work on the analytics side? Or will that mess up the numbers?

What is a query you would be doing that would include this property?