apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.22k stars 1.18k forks source link

Internal error in `regexp_replace()` for some StringView input (SQLancer) #12150

Closed 2010YOUY01 closed 1 month ago

2010YOUY01 commented 2 months ago

Describe the bug

One query can run successfully on a table with a regular string column If we convert this string column's physical representation to StringView, the query failed

See reproducer in datafusion-cli (Compiled from latest main using cargo run, commit a58416c2e)

The last query is supposed to run successfully like the previous one

DataFusion CLI v41.0.0
> create table t1(v1 text);
0 row(s) fetched.
Elapsed 0.058 seconds.

> insert into t1 values ('DataFusion'), ('datafusion');
+-------+
| count |
+-------+
| 2     |
+-------+
1 row(s) fetched.
Elapsed 0.047 seconds.

> create table t1_stringview as
select arrow_cast(v1, 'Utf8View') as v1
from t1;
0 row(s) fetched.
Elapsed 0.011 seconds.

# Now we have two equivalent tables `t1` and `t1_stringview`
# The difference is physical representation for string column (StringArray and StringViewArray)

> select regexp_replace(v1,lower(v1),'bar') from t1;
+------------------------------------------------+
| regexp_replace(t1.v1,lower(t1.v1),Utf8("bar")) |
+------------------------------------------------+
| DataFusion                                     |
| bar                                            |
+------------------------------------------------+
2 row(s) fetched.
Elapsed 0.014 seconds.

> select regexp_replace(v1,lower(v1),'bar') from t1_stringview;
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLancer https://github.com/apache/datafusion/issues/11030

2010YOUY01 commented 2 months ago

Maybe this can be fixed together while working on https://github.com/apache/datafusion/issues/11912

devanbenz commented 2 months ago

take

devanbenz commented 2 months ago

Disregard my closed PRs 😅 I accidently committed a kind binary that was in my DF folder 😆