PostHog / meta

This is a place to discuss non-product issues in public.
MIT License
18 stars 4 forks source link

Messaging: Persons on Events #251

Open joethreepwood opened 1 month ago

joethreepwood commented 1 month ago

Here we go again.

Short version: We want to make some ClickHouse changes which will speed things up, but also create some data changes. We tried this before and got some complaints, which stalled it. Now, we're trying again.

The new approach is slightly different to the old one. Namely: we're not changing the way we merge users. However, all filters which are based on person properties will be affected.

We don't want to give users a way to opt-out of this, but we do need to give the biggest customers some warning. This is because the impact will be the most visible to the largest teams, but this is ultimately a necessary step for performance improvements.

The basic gist of the comm is:

Here's what we need to do...

And here's what I need...

In the meantime, I'll work on changelog and email copy here for a 👍 and final decision from @timgl - previews soon

joethreepwood commented 1 month ago

Draft email copy.

Screenshot 2024-10-21 at 15 02 09
joethreepwood commented 1 month ago

If I can get a 👍 on the above, I can then put the changelog copy together drawing and expanding from that.

joethreepwood commented 1 month ago

Some changelog copy

In a slight deviation from our usual changelog format, we’re posting today to let you know that  _next_ week, on XXth XXX, we’re going to combine our ClickHouse tables for Persons and Events into one. That may not sound exciting, but it’s actually a pretty substantial backend change even if you won’t unlock any new features. 

What it does unlock, however, is massive speed increases for any queries involving person properties. These queries will now be **up to 4x faster** than they were before because we no longer have to `JOIN` multiple tables to get an answer — all the info we need is in one place. 

Combining these tables also means that person properties will now be stored on events and will reflect the value at the time of the event, rather than the latest value. If you need to reliably access the latest value, we recommend using a dynamic cohort instead.

We expect this to be an invisible change, aside from the fact that we’re adding some go-faster stripes to our ClickHouse tables. But if you do spot something odd, you can get in touch as always.

@timgl can I get a final check from you in case I understand ClickHouse even less well than I think?

If this looks correct, all I need is a list of targets for the email from @simfish85 and a date we want to launch it on.