apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

Support DictionaryString for Regex matching operators #12618

Closed goldmedal closed 1 month ago

goldmedal commented 1 month ago

Is your feature request related to a problem or challenge?

While I was working on #12415, I found the DictionaryString can't pass the following case in datafusion/sqllogictest/test_files/string/string_query.slt.part

statement ok
create table test_basic_operator as
select
    arrow_cast(column1, 'Dictionary(Int32, Utf8)') as ascii_1,
    arrow_cast(column2, 'Dictionary(Int32, Utf8)') as ascii_2,
    arrow_cast(column3, 'Dictionary(Int32, Utf8)') as unicode_1,
    arrow_cast(column4, 'Dictionary(Int32, Utf8)') as unicode_2
from test_source;

query BB
SELECT
  ascii_1 ~* '^a.{3}e',
  unicode_1 ~* '^d.*Фу'
FROM test_basic_operator;
----
true false
false false
false true
NULL NULL

I got the error message:

External error: query failed: DataFusion error: Internal error: Data type Dictionary(Int32, Utf8) not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Describe the solution you'd like

Support DictionaryString at https://github.com/apache/datafusion/blob/65595cf7f88d5393fded416f8d001a9e90b18169/datafusion/physical-expr/src/expressions/binary.rs#L148-L158

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 1 month ago

Thanks @goldmedal -- this is a great find

goldmedal commented 1 month ago

Related TODO item:

blaginin commented 1 month ago

I want to take those type issues if you don't mind, @goldmedal and Andrew. It feels like a nice way to get into in the project 😀

blaginin commented 1 month ago

take

alamb commented 1 month ago

Thank you @blaginin

BTW here is an example that might help https://github.com/apache/datafusion/pull/12712