delta-io / kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake
Apache License 2.0
337 stars 72 forks source link

Keep last_offset when value buffer is consumed #97

Closed mosyp closed 2 years ago

mosyp commented 2 years ago

The #95 fixes dupes caused by already process offset within a record batch, however such offsets could come within a several record batches. Since last_offset is cleared once buffer is consumed, we cannot check on that. To fix the bug, we have to keep last_offset in between consumes too, e.g. do not clear it at all