Open MarconLP opened 10 months ago
how many duplicates can we process before performance is taking a big hit
Any updates on this? We have some fairly serious data issues due to duplication making many of our reports unusable. We can create custom HogQL to do this deduping on the report level but obviously that has downside. Our team that creates self-service reports are unable to trust the numbers right now.
Hey @jetaggart, could you open a ticket through the PostHog app? https://app.posthog.com/home#supportModal=support%3Adata_integrity
I have an open ticket that is being worked on however I'm wondering what the underlying issue is and what resolution will be (ticket 5270)
We've received several reports on duplicated events. We will tackle this problem in three places (Duplicate removal at ingestion, Duplicate removal at CH, Duplicate removal at query time)
Add duplicate removal at query time:
related: https://posthog.slack.com/archives/C0374DA782U/p1692700659345339?thread_ts=1692696455.147909&cid=C0374DA782U