Open tustvold opened 1 year ago
Currently a number of operations are implemented directly on ScalarValue, including:
Not only does this result in a huge amount of code, but also these operations don't behave the same way as their array counterparts.
For example:
These kernels largely appear to exist for the purposes of aggregation, where the aggregated types are known statically. We should replace these uses with specialization, as done in https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1248104156. The remaining uses should make use of the new Datum abstraction https://github.com/apache/arrow-rs/pull/4393 to use the same arrow-rs kernels https://github.com/apache/arrow-rs/pull/4465
No response
https://github.com/apache/arrow-datafusion/pull/6832 updates DF to use the Datum kernels
I think we have made substantial progress on this issue -- what is left to do?
IIRC there are some aggregates, like first and last that are not yet specialized
Is your feature request related to a problem or challenge?
Currently a number of operations are implemented directly on ScalarValue, including:
Not only does this result in a huge amount of code, but also these operations don't behave the same way as their array counterparts.
For example:
Describe the solution you'd like
These kernels largely appear to exist for the purposes of aggregation, where the aggregated types are known statically. We should replace these uses with specialization, as done in https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1248104156. The remaining uses should make use of the new Datum abstraction https://github.com/apache/arrow-rs/pull/4393 to use the same arrow-rs kernels https://github.com/apache/arrow-rs/pull/4465
Describe alternatives you've considered
No response
Additional context
4973 tracks improving the aggregator performance
https://github.com/apache/arrow-datafusion/pull/6832 updates DF to use the Datum kernels