brimdata / zync

Kafka connector to sync Zed lakes to and from Kafka topics
BSD 3-Clause "New" or "Revised" License
17 stars 3 forks source link

continuous syncing service #83

Open mccanne opened 2 years ago

mccanne commented 2 years ago

Syncing from kakfa and doing ETL should run as a continuous service so we don't need to poll and recompute progress state on each run.

Step 1 is to get from-kafka to provide a continuous service where it listens on each configured topic and syncs data as it arrives. There should be two parameters to drive commits: a data limit and a timeout. When data arrives but does not exceed the data limit, a timeout triggers processing.

Step 2 is to automate ETL based on from-kafka commits. Here the service is running continuously and whenever data arrives that could be consumed by an ETL, the logic is run automatically. This way, we don't need to run ETLs on a polling loop as they are run only when they have new data to process.

philrz commented 1 year ago

Step 1 is complete, and Step 2 remains to be done.