PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
19.4k stars 1.13k forks source link

[EPIC] build a data warehouse product #14406

Open jamesefhawkins opened 1 year ago

jamesefhawkins commented 1 year ago

Is your feature request related to a problem?

When users scale, data accuracy and reporting flexibility get more important. As a result, they eventually want to export their customer and product data to a warehouse. This happens for two reasons (1) it means they can query a wider range of data together (2) they can debug data issues.

This means they will still use PostHog as their data pipeline.

However, their business metrics and logic all move to the warehouse too, where they can trust them. We've seen multiple customers in this situation - always higher value ones.

This means, however, their value of PostHog is decreased and the customer has a ton of data engineering to do.

2 options:

We aren't here to just be fine...

Describe the solution you'd like

We offer customers a data warehouse. It's simpler for them, it's more valuable, and it's all in one.

The problems this will solve:

Customers will need:

We should host ourselves instead of using clickhouse cloud. This reduces vendor risk (them raising prices or going out of business), represents data to just us, and forces us to build clickhouse as a core competency.

Sequencing

How does this go wrong

There's no reason we can't start now, but we will probably need 2-3 clickhouse experts in our org. We've the money to hire them.

Describe alternatives you've considered

we've had a previous suggestion to build translation layer - feels abstracted / error prone / hacky and requires customers to setup the entire stack and has a complex story

Thank you for your feature request – we love each and every one!

mariusandra commented 1 year ago

Describe alternatives you've considered

we've had a previous suggestion to build translation layer - feels abstracted / error prone / hacky and requires customers to setup the entire stack and has a complex story

Are you referring to HogQL -> BigQuery, etc here... or just HogQL in general (aka HogQL -> ClickHouse)?

lharries commented 1 year ago

@mariusandra he's referring to "HogQL -> BigQuery"

For "we've had a previous suggestion to build translation layer - feels abstracted / error prone / hacky and requires customers to setup the entire stack and has a complex story"

I don't think they'd need to setup the entire stack. We'd still provide them clickhouse out the box, but they could move to another backend if needed.

For "feels abstracted / error prone / hacky" - I think it would be feasible to do this well based on a discussion with Marius. Although it's not trivial.

However, overall I agree that the focus should be on making our current system backed by clickhouse the best possible so they don't need to switch warehouse. But I think it's worth keeping the translation layer as a potential future avenue