delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.21k stars 395 forks source link

SchemaError occurs during table optimisation after upgrade to v0.18.1 #2731

Closed r1fad closed 1 month ago

r1fad commented 1 month ago

Environment

Delta-rs version: v0.18.1

Binding: rust


Bug

What happened: In my project, there is a service that runs on a scheduled basis to compact and optimise delta tables. After upgrading delta-rs to v0.18.1 (from v0.17.1) we see an error saying Delta(Arrow { source: SchemaError("Could not find column keys") }). There is no column called 'keys' in our datasets so I am do not understand why we get this error. The error comes from deltalake-core-0.18.1/src/operations/cast.rs:150.

What you expected to happen: No schema errors should be reported when upgrading to a new version of delta-rs.

How to reproduce it: Not entirely sure how to reproduce it since we do not have a column called 'keys' in our datasets.

More details: This error does not occur in the version we run in production which is v0.17.1 (commit hash 25962a05452b13a2c08b9beda98fbbb0252dd436). The look back window on the compactor service is 2 days. So after 2 days of running the new version, we no longer see the SchemaError which leads me to believe that the v0.18.1 cannot optimise tables that contain data that was inserted by v0.17.1.

r1fad commented 1 month ago

@ion-elgreco how can I incorporate your fix into my project?

ion-elgreco commented 1 month ago

@ion-elgreco how can I incorporate your fix into my project?

You will have to wait for the next release when we bump kernel

r1fad commented 1 month ago

@ion-elgreco any idea on when the next release of delta-rs is coming? I imagine this fix is important for quite a lot of people. And the change introduced earlier with the kernel is a breaking change.

I also noticed that after downgrading my project to v0.17.1, it can no longer optimise tables that contain data inserted by v0.18.1. This issue should disappear after 2 days though because lookback window is 2 days

ion-elgreco commented 1 month ago

@r1fad somewhere today or this weekend it will be released