delta-io / kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake
Apache License 2.0
359 stars 79 forks source link

Introduce schema evolution via the -S flag #164

Open rtyler opened 8 months ago

rtyler commented 8 months ago

This set of changes implements more nuanced handling of Delta vs. message schema with the introduction of the -S command line flag which enables schema evolutions.

In order for schema evolution to work, there is a necessary performance hit. kafka-delta-ingest must determine the schema of every message that is read from Kafka, infer its schema, and if necessary add nullable columns to the Delta table. This is related to similar work in delta-rs for the RecordBatchWriter but deviates because of the mechanism by which RecordBatches and schema is handled in kafka-delta-ingest.

Sponsored-by: Raft LLC

NOTE: This pull request builds on #162