Closed tshauck closed 1 month ago
take
@Rachelint a gentle ping 😃 Are you working on this? I'm willing to take this one to speed up the progress of this epic task 💪
@Rachelint a gentle ping 😃 Are you working on this? I'm willing to take this one to speed up the progress of this epic task 💪
Ok, feel free to take it, I am still struggling on #11943 , and seems can't work on this temporarily...
Part of https://github.com/apache/datafusion/issues/11752 and https://github.com/apache/datafusion/issues/11790
Currently, a call to
CONTAINS
with a Utf8View datatypes induces a cast. After the change that fixes this issue, it should not.contains is defined here: https://github.com/apache/datafusion/blob/main/datafusion/functions/src/string/contains.rs
Is your feature request related to a problem or challenge?
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See https://github.com/apache/datafusion/issues/10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the function to support DataType::Utf8View directly
Describe alternatives you've considered
The typical steps are:
string_view.slt
to ensure the arguments are not being castSignature
of the function to acceptUtf8View
in addition toUtf8
/LargeUtf8
Utf8View
Example PRs
Additional context
The documentation of string functions can be found here: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#string-functions
To test a function with StringView with
datafusion-cli
you can use an example like this (replacingstarts_with
with the relevant function)To see if it is using utf8 view, use
EXPLAIN
to see the plan and verify there is noCAST
. In this example theCAST(column1@0 AS Utf8)
indicates that the function is not usingUtf8View
nativelyIt is also often good to test with a constant as well (likewise there should be no cast):