PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.62k stars 1.23k forks source link

Integrity issue - PostgreSQL and ClickHouse out of sync #10239

Closed guidoiaquinti closed 1 month ago

guidoiaquinti commented 2 years ago

Bug description

PostHog uses two main databases:

Some functionalities of PostHog rely on having a subset of data in sync between the two (e.g. person properties). We currently try to archive this by double writing to the two datastores directly in the app. Unfortunately, the datasets are far from being in sync as we use nothing like 2 phase commits that would rollback a transaction entirely if one of the write operation fails.

This data drift is creating issues at the application level and in the overall data integrity. I’m creating this issue to keep track of proposals to address the problem.

Environment

Additional context

guidoiaquinti commented 2 years ago

My proposal is to implement a Change Data Capture process (see as example the section A Common CDC Architecture with Debezium)

tiina303 commented 2 years ago

This was potentially the cause of https://posthog.slack.com/archives/C0374DA782U/p1658758589681689

posthog-bot commented 1 month ago

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

posthog-bot commented 1 month ago

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.