PostHog / plugin-server

Service to process and save PostHog events, supporting JS/TS plugins at that
https://posthog.com
8 stars 5 forks source link

Safer people merging #576

Closed yakkomajuri closed 3 years ago

yakkomajuri commented 3 years ago

Changes

A LOT more info coming soon, hold tight

Still some things to fix here.

Also thanks a lot to @hazzadous for the initial work here and a very solid foundation for the bulk of the tests here. The tests you wrote were very good.

So this PR batches 2 changes in 1. This is unfortunate but was a decision made to ship faster, given they both made significant changes to the same area of the codebase, looking to fix the same issue, just taking different approaches.

The 2 changes are:

  1. A refactor of mergePeople. Doing away with the sketchy while (true) loop and logic that was very prone to leaving things out of sync. The big problem here is that we were enqueueing kafka messages for updates to CH directly at each update to Postgres, which is bad. Really we want to only enqueue all the messages when we know everything we should do in Postgres has succeeded, which we can accomplish with a transaction.
  2. Fixing https://github.com/PostHog/posthog/issues/5527 by not allowing a merge from an identified user.

1 is reasonably self-explanatory. 2 needs some context and I had to think quite a bit about how to conceptually approach it.

$create_alias

This has been left alone. If someone explicitly asks us to merge 2 users, we will do it. It's much harder to shoot yourself in the foot here, and we shouldn't infer for users that they don't want to merge 2 users they explicitly chose to merge. Plus $create_alias underpins our person merge modal, where users can manually pick in a very explicit way what users they want to merge into one. We should let them do that, identified or not.

$identify

Here's where things get tricky. Here's a table explaining all the paths and their results:

Note: Distinct ID 1 refers to the person we're merging from, and Distinct ID 2 is the person we're merging into

Distinct ID 1 exists Person for Distinct ID 1 is identified Distinct ID 2 exists Person for Distinct ID 2 is identified Outcome
🚫 🚫 🚫 🚫 Create person with both distinct IDs
🚫 🚫 βœ… 🚫 Add Distinct ID 1 to the person for Distinct ID 2
🚫 🚫 βœ… βœ… Add Distinct ID 1 to the person for Distinct ID 2
βœ… 🚫 🚫 🚫 Add Distinct ID 2 to the person for Distinct ID 1
βœ… βœ… 🚫 🚫 Add Distinct ID 2 to the person for Distinct ID 1
βœ… 🚫 βœ… 🚫 Merge the two people
βœ… 🚫 βœ… βœ… Merge the person for distinct ID 1 (anonymous) into the one for distinct ID 2 (identified)
βœ… βœ… βœ… βœ… Do not merge!! Leave the 2 people alone.
βœ… βœ… βœ… 🚫 Do not merge!! We will not merge an identified person into an anonymous one

Checklist

yakkomajuri commented 3 years ago

sorry for the commit spam here - CH tests take extremely long for me locally so it's faster to push and run them remotely.

hazzadous commented 3 years ago

sorry for the commit spam here - CH tests take extremely long for me locally so it's faster to push and run them remotely.

Hah, are you using a remote CH?

fuziontech commented 3 years ago

you should rebase and try running them on a big CodeSpaces instance πŸ˜‰

timgl commented 3 years ago

Re tests being slow, one change that I made on the main posthog repo is to not empty/truncate clickhouse tables between every test run, and instead just filter every database call by team ID. That cut execution time in half so might be worth trying here.

yakkomajuri commented 3 years ago

Tests cover this but if you want to "see it with your own eyes"

https://user-images.githubusercontent.com/38760734/133766133-7233aa60-642c-48e5-8da9-a2c50c18a1df.mov

yakkomajuri commented 3 years ago

yeah that merge people test seems like it was already a bit flaky before, passed locally before pushing, will fix

yakkomajuri commented 3 years ago

Friday afternoon, I've been working to get tests to pass and this happens:

Screenshot 2021-09-17 at 14 47 48

I hate life

fuziontech commented 3 years ago

NoooOOOOOOoooooo!

yakkomajuri commented 3 years ago

Updated the query counter. That's a horrible test because it will flake with all sorts of changes + doesn't really help us test stuff. I'll get rid of it somewhere else. This PR has shown me I really need to spend some time refactoring plugin server stuff