apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Use Specialization Instead of ScalarValue Binary Operations #6842

Open tustvold opened 1 year ago

tustvold commented 1 year ago

Is your feature request related to a problem or challenge?

Currently a number of operations are implemented directly on ScalarValue, including:

Not only does this result in a huge amount of code, but also these operations don't behave the same way as their array counterparts.

For example:

Describe the solution you'd like

These kernels largely appear to exist for the purposes of aggregation, where the aggregated types are known statically. We should replace these uses with specialization, as done in https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1248104156. The remaining uses should make use of the new Datum abstraction https://github.com/apache/arrow-rs/pull/4393 to use the same arrow-rs kernels https://github.com/apache/arrow-rs/pull/4465

Describe alternatives you've considered

No response

Additional context

4973 tracks improving the aggregator performance

https://github.com/apache/arrow-datafusion/pull/6832 updates DF to use the Datum kernels

alamb commented 9 months ago

I think we have made substantial progress on this issue -- what is left to do?

tustvold commented 9 months ago

IIRC there are some aggregates, like first and last that are not yet specialized