databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

To extract partition fields from a timestamp #224

Closed shift-alt-del closed 8 months ago

shift-alt-del commented 8 months ago

Hi, I'm now working on a PoC to sink logs from Kafka to Iceberg format, I want to partition the logs to under year=YYYY/month=MM/day=DD, but I only have a timestamp inside the log.

I didn't found any configurations on how to partition logs with timestamp, so wondering if there any workarounds existing already?

I think there is an workaround to use SMT to duplicate the ts_ms into year, month, day, then extract data into 3 different fields and set to iceberg.tables.default-partition-by, however it makes the connector config dirty yet requires to code a custom SMT function...

For a detailed example, my log format is like

{
    "ts_ms": 1588252618953,
    "data": "abcd"
}

Thanks.

shift-alt-del commented 8 months ago

I found this one:

https://github.com/tabular-io/iceberg-kafka-connect/tree/main/kafka-connect-transforms#smts-for-the-apache-iceberg-sink-connector

shift-alt-del commented 8 months ago

Close issue, found a duplicate one with a workaround: