🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
As a user of S3 batch exports, it can be easier to process individual files per event, but only partitioning by timestamp and table fields is currently supported.
The main challenge with this feature is that we do not know which event names to partition by before we query ClickHouse. So, we may need to delay the creation of an S3 upload until we start seeing events, and then maintain one S3 upload per event name. The question pending with this is how to allow recovery in the event of a worker crash now that we have a potentially very large number of simultaneous S3 uploads. Maybe Temporal heartbeating is not enough to support this feature and we will need to look into new ways of tracking progress.
As a user of S3 batch exports, it can be easier to process individual files per event, but only partitioning by timestamp and table fields is currently supported.
The main challenge with this feature is that we do not know which event names to partition by before we query ClickHouse. So, we may need to delay the creation of an S3 upload until we start seeing events, and then maintain one S3 upload per event name. The question pending with this is how to allow recovery in the event of a worker crash now that we have a potentially very large number of simultaneous S3 uploads. Maybe Temporal heartbeating is not enough to support this feature and we will need to look into new ways of tracking progress.