dwyl / technology-stack

🚀 Detailed description + diagram of the Open Source Technology Stack we use for dwyl projects.
288 stars 26 forks source link

ClickHouse? #145

Open nelsonic opened 3 months ago

nelsonic commented 3 months ago

Opening to capture some basic knowledge ... 📝

We have recently been forced to use ClickHouse as part of deploying Plausible Analytics ("Community Edition") https://github.com/dwyl/learn-analytics/pull/4 ... Don't have anything against it. Just wonder if the volume of data we are likely to see for a basic website justifies the expense/overhead of having two databases (Postgres and ClickHouse ...) 💭

https://clickhouse.com image

https://github.com/ClickHouse/ClickHouse image

timadevelop commented 3 months ago

as part of deploying Plausible Analytics ("Community Edition") https://github.com/dwyl/learn-analytics/pull/4 ...

Did you manage to solve the issue with backups? afair people backed up whole machine volume, not database per se...

ClickHouse is top-notch for analytics, it was build specifically for analytics, easy to get up and running, but it gets complex to manage once you're in production. There are some concerns about TimescaleDB performance and CE licensing in comparison to ClickHouse, but overall for lean companies Timescale is much easier to deal with I think. I know people who gave up Plausible just because of the backup strategy, maybe it's better now?

nelsonic commented 3 months ago

Yeah, the Plausible backup story isn't "fixed" yet. Which is why we are trying to figure out if we can use Postgres for the Analytics data instead of ClickHouse - which we agree is better for higher volumes of data ...

Reading: https://clickhouse.com/docs/en/faq/general/why-clickhouse-is-so-fast#performance-when-inserting-data

image

"We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second."

This means the Application has to temporarily store the rows in memory before inserting. 😕

Replication uses Zookeper: https://clickhouse.com/docs/en/architecture/replication https://en.wikipedia.org/wiki/Apache_ZooKeeper (Java) Nothing "wrong" with that. Just noting that it's not a simple setup. 💭