apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.4k stars 1.21k forks source link

Regression in 43.0.0: coalesce no longer works between Utf8 and Utf8View columns #13568

Open ttencate opened 6 days ago

ttencate commented 6 days ago

Describe the bug

coalesce() no longer considers Utf8 and Utf8View columns as the same type.

To Reproduce

use datafusion::common::arrow::array::{ArrayRef, StringArray, StringViewArray};
use datafusion::common::arrow::record_batch::RecordBatch;
use datafusion::prelude::*;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    let ctx = SessionContext::new();
    let df = ctx
        .read_batch(
            RecordBatch::try_from_iter([
                (
                    "utf8",
                    Arc::new(StringArray::from(vec!["a", "b"])) as ArrayRef,
                ),
                (
                    "utf8view",
                    Arc::new(StringViewArray::from(vec!["a", "b"])) as ArrayRef,
                ),
            ])
            .unwrap(),
        )
        .unwrap();
    df.select(vec![coalesce(vec![col("utf8"), col("utf8view")])])
        .unwrap()
        .collect()
        .await
        .unwrap();
}

Result:

thread 'main' panicked at src/main.rs:25:10:
called `Result::unwrap()` on an `Err` value: Plan("Execution error: User-defined coercion failed with Execution(\"Fail to find the coerced type, errors: Some(Execution(\\\"Expect to get struct but got Utf8\\\"))\") No function matches the given name and argument types 'coalesce(Utf8, Utf8View)'. You might need to add explicit type casts.\n\tCandidate functions:\n\tcoalesce(UserDefined)")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Expected behavior

No error.

Additional context

It worked fine in version 42.2.0.

alamb commented 5 days ago

I vaguely remember that @jayzhan211 worked on coalesece recently, maybe that was releated

jayzhan211 commented 5 days ago

It seems the issue is fixed already, I couldn't reproduce the error on the latest commit

Including your test and this

statement count 0
create table t(a varchar, b varchar) as values ('a', 'b'), ('c', 'd');

statement ok
create table t2 as
select
    a as c1,
    arrow_cast(b, 'Utf8View') as c2
from t;

query T
select coalesce(c2, c1) from t2;
----
b
d