apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Support for `LargeString` and `LargeBinary` for `StringView` and `BinaryView` #11023

Open alamb opened 1 week ago

alamb commented 1 week ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/10918, [StringViewArray](https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html) support in DataFusion

@Weijun-H added support for <> and != forStringView in https://github.com/apache/datafusion/pull/10985 and @PsiACE added support for BinaryView in https://github.com/apache/datafusion/pull/11004

We also need similar support for LargeBinary and LargeUtf8 types

Describe the solution you'd like

Please remember to create a PR that targets the string-view branch (not main)

In order to improve performance of these queries we will need the ability to actually compare StringViewArrays to constant values (and likely to each other)

Thus I would like to be able to run

LargeUtf8 = StringView StringView = LargeUtf8 LargeBinary = BinaryView BinaryView = LargeBinary

Describe alternatives you've considered

I think you can follow the model in https://github.com/apache/datafusion/pull/11004 and simply extend the tests to have another column of LargeString and LargeBinary

Additional context

No response

XiangpengHao commented 1 week ago

take