PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
19.45k stars 1.14k forks source link

Set up a Kafka - ClickHouse monitoring #23451

Open Daesgar opened 5 days ago

Daesgar commented 5 days ago

Create a Kafka - ClickHouse dashboard to track relevant metrics and be alerted when there is an anomaly in the consumption or a big gap between a topic and the events table..

A WIP version of the dashboard is in progress, but still needs refinement.

Interesting metrics to track:

fuziontech commented 4 days ago

I've been thinking about this some more and I do think having a separate process that lives outside of CH and inserts parquet files into S3 would go a long way here.

My proposal is to add this helm chart to our deployed charts and use it to dump parquet files directly from kafka to S3. This would allow us to store events from kafka very cheaply, query them for metrics and for debugging, and leverage them for recovery if we need to.

Benefits:

Cons:

Daesgar commented 4 days ago

Cons:

* Requires a new service, probably Kafka Connect

  * Not a huge problem though since this might actually be something we need in the future for ByConity or for CH in the long run

To reduce the operational load, we could make use of MSK connect. It handles the hardware under the hood and automatically scales based on the throughput (we can set a maximum number of workers for the scaling). I did not test it though, but should work fine.

The cons of using MSK connect is that I don't think it

  * [clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink](https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink) <- like you said, this is very mature

And it delivers exactly-once semantics, which would be a nice to have as well when we think on switching to the connector.