delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.97k stars 365 forks source link

panic in `push_down_filter` #2602

Open ryzhyk opened 2 weeks ago

ryzhyk commented 2 weeks ago

Environment

Delta-rs version: v0.18.0

Binding: Rust

Environment:


Bug

What happened:

Running the following query: select * from snapshot where id > 10000 and id < 20000 against a Delta table panics with:

called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 'push_down_filter' failed", Internal("Vec returned length: 1 from supports_filters_pushdown is not the same size as the filters passed, which length is: 2"))

If I remove the second condition and just leave select * from snapshot where id > 10000, it works. I believe this is a regression in v0.18.0, since this worked before.

What you expected to happen:

How to reproduce it:

Here is a complete repro:

use std::sync::Arc;
use deltalake::{datafusion::prelude::SessionContext, DeltaTableBuilder};

#[tokio::main(flavor = "multi_thread", worker_threads = 10)]
async fn main() {

    let datafusion = SessionContext::new();

    let table_builder = DeltaTableBuilder::from_uri("data");
    let delta_table = Arc::new(table_builder.load().await.unwrap());
    datafusion.register_table("snapshot", delta_table).unwrap();

    let df = datafusion
        .sql("select * from snapshot where id > 10000 and id < 20000")
        .await
        .unwrap();

    df.collect().await.unwrap();

}

where data contains an empty delta table.

Here is a complete repro, including Rust code and the empty table: conjunctive_join.zip. Just run cargo run to reproduce.

More details:

rtyler commented 2 weeks ago

thanks for the repro code, we'll take a looksee

rtyler commented 1 week ago

I have taken your test code and created a failing test in this pull request which we can start to fix #2604