delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[Feature Request] Support additive change speculation for Delta source schema tracking #3253

Open jackierwzhang opened 2 weeks ago

jackierwzhang commented 2 weeks ago

Feature request

Which Delta project/connector is this regarding?

Overview

We currently support schemaTrackingLocation (doc) that allows Delta streaming source to track additive and non-additive schema changes during streaming from a Delta table.

Right now, every single schema changes would cause a new schema version to be generated in the tracking location, however, it might not be necessary for non-backward incompatible changes like ADD COLUMN. One way is to speculate ahead in the Delta log, skip over the ADD COLUMN schema changes, and use the encompassing schema as the stream read schema.

Motivation

This could allow possibly a lot fewer stream restarts due to ADD COLUMN schema changes.

Further details

The speculation ahead can be potentially implemented here

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?