Closed alamb closed 2 weeks ago
An update here is that @XiangpengHao has a PR with various changes in https://github.com/apache/datafusion/pull/11862
We still need to check that PR and figure out what else is in that PR is needed to be enabled "for real" (with tests, etc)
My ideal resolution here is that we end up in the state where the only change we need to enable string view by default is switch the config setting. I will do some more ticket triage later today to outline other items I know of
Do we have tickets for regexp binary operators? (like ~
, !~
...)
https://datafusion.apache.org/user-guide/sql/operators.html#op-re-match
I noticed stringview is not supported on them yet and they have separate implementation than regexp functions
``` /*DML*/CREATE TABLE t0(v0 DOUBLE, v1 DOUBLE, v2 BOOLEAN, v3 BOOLEAN, v4 BOOLEAN, v5 STRING); /*DML*/INSERT INTO t0(v1, v5, v2) VALUES (0.7183242196192607, 'Tn', true); /*DML*/CREATE TABLE t0_stringview AS SELECT v0, v1, v2, v3, v4, arrow_cast(v5, 'Utf8View') as v5 FROM t0; > select v5 ~ 'foo' from t0_stringview; Internal error: Data type Utf8View not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker > select regexp_match(v5, 'foo') from t0_stringview; +--------------------------------------------+ | regexp_match(t0_stringview.v5,Utf8("foo")) | +--------------------------------------------+ | | +--------------------------------------------+ 1 row(s) fetched. Elapsed 0.034 seconds. ```
Do we have tickets for regexp binary operators? (like ~, !~...)
Not that I know of -- it would be great to add them
Do we have tickets for regexp binary operators? (like , !...)
I am going to try and polish up PR to enable string view by default PR (with the arrow upgrade and various recent improvements) and see how close we are https://github.com/apache/datafusion/pull/12092
StringView by default is finally merged into DataFusion: https://github.com/apache/datafusion/pull/13101 so I am claiming success and completion of this issue
Is your feature request related to a problem or challenge?
This ticket is a follow on to https://github.com/apache/datafusion/issues/10918 where we implemented enough initial support for
StringView
/BinaryView
that we can show some pretty sweet ClickBench resultsDescribe the solution you'd like
This epic tracks remaining work to complete the "initial" work which I would like to define as "enable using StringView when reading Strings from Parquet by default"
I am sure there will be additional work / support to add StringView to various other features of DataFusion that we can maybe track with another follow on ticket
Required for enabling StringView by default:
Could work around but really should be fixed upstream
Additional "Nice to have" Features