PostHog / plugin-server

Service to process and save PostHog events, supporting JS/TS plugins at that
https://posthog.com
8 stars 5 forks source link

Event to have original passed set and set-once properties #615

Closed tiina303 closed 2 years ago

tiina303 commented 2 years ago

Changes

Currently we store the user properties we changed only on the events in ClickHouse.

If we store all the properties that were sent (set) with the event we'll have a much easier time debugging if we messed up somehow as we'll have all the data to be able to recompute things.

I believe this will also make it easier for users to understand what's happening. We store events in order of timestamps in the events page, but we might ingest them out of order. Imagine two events set a certain value, but we ingested the older event later, we would only see the set on the newer event as that's what was ingested first and changed the value.

This change was originally added in https://github.com/PostHog/plugin-server/pull/558

I'm now checking which properties we are actually updating and only storing those against the event. Should massively reduce the amount of data we need to store in events, plus the amount of data we're sending to Postgres to update the user

Checklist

posthog-bot commented 2 years ago

Hey @tiina303! 👋 This pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.

tiina303 commented 2 years ago

I rebased this on master, but the tests seem to have been ran (after a retry also on the older branch location).

mariusandra commented 2 years ago

The original intent was to greatly reduce the amount of data stored, since posthog-js sends... or used to send... a lot of user properties with every request. What will the impact of storage/cost be when enabling this?

Other than that question and/or if this has been considered LGTM.

tiina303 commented 2 years ago

I asked @fuziontech & he said storage is cheap.

tiina303 commented 2 years ago

Edit: works fine now after I created a new separate PR which triggered the check here to be rebased properly

For the failing test I don't understand this: in the check I see

> 2020 |         expect(await hub.db.fetchDistinctIdValues(person)).toEqual(['distinct_id1'])
           |                                                            ^
      2021 |         expect(person.properties).toEqual({ a_prop: 'test-set' })

but the code on this branch is different https://github.com/PostHog/plugin-server/blob/1284a415784dcf602f20650a2611019ad92b6226/tests/shared/process-event.ts#L2020 (i.e. somehow it's using where the branch originally was)