apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Support `Utf8View` for `bit_length` kernel #6671

Closed austin362667 closed 2 weeks ago

austin362667 commented 2 weeks ago

Which issue does this PR close?

Closes https://github.com/apache/datafusion/issues/13195

Rationale for this change

Thanks to @jayzhan211 , he noticed following issue, array compute kernel bit_length() doesn't support Utf8View type:

create table test_source as values
  ('Andrew', 'X'),
  ('Xiangpeng', 'Xiangpeng'),
  ('Raphael', 'R'),
  (NULL, 'R');
create table test as
SELECT
  arrow_cast(column1, 'Utf8View') as column1_utf8view
FROM test_source;

select bit_length(column1_utf8view) from test;

caused the error:

query error DataFusion error: Arrow error: Compute error: bit_length not supported for Utf8View

What changes are included in this PR?

Update bit_length() array function to support Utf8View

Are there any user-facing changes?

austin362667 commented 2 weeks ago

Thanks @tustvold @alamb @findepi for the review.

tustvold commented 2 weeks ago

Switching to from_unary has reverted the optimisation to not inspect the string data - https://github.com/apache/arrow-rs/pull/6671#discussion_r1827848438 is how to do this correctly

austin362667 commented 2 weeks ago

Got it! Thanks @tustvold I replaced from_unary with the approach you suggested.