Aiven-Open / cloud-storage-connectors-for-apache-kafka

Repository containing Cloud Storage Connectors for Apache Kafka®
Apache License 2.0
9 stars 13 forks source link

Group data every hour #263

Open amit2103 opened 3 years ago

amit2103 commented 3 years ago

Hey, currently when we group by say key whats the time period ill which it will group ?

Is there a way to group data every hour and send to GCS ?

DanielWozniak94 commented 3 years ago

Looks like this can be addressed with setting offset.flush.interval.ms. Only issue is, it needs to be done on worker level since on connector level the config wasn't getting picked up.

ivanyu commented 2 years ago

Hi @amit2103 and @DanielWozniak94 A very important note here is that grouping by key can't include other fields like timestamp. So, it's literally one value per file.

Apart from this, offset.flush.interval.ms is the only way to control this now. Unfortunately, it's Connect current limitation that it's not possible to set this per connector. The connector itself is a bit reactive in this sense: it doesn't have it's own thread or timer of any sort that could prompt it to flush on its own schedule.