apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
815 stars 160 forks source link

Improve performance of Spark-compatible decimal aggregates #951

Open andygrove opened 1 month ago

andygrove commented 1 month ago

What is the problem the feature request solves?

The benchmarks added in https://github.com/apache/datafusion-comet/pull/948 show that Comet's Spark-compatible aggregates are ~50% slower than the DataFusion equivalents:

aggregate/avg_decimal_datafusion
                        time:   [653.56 µs 657.57 µs 662.06 µs]
aggregate/avg_decimal_comet
                        time:   [1.0581 ms 1.0592 ms 1.0604 ms]
aggregate/sum_decimal_datafusion
                        time:   [695.51 µs 696.48 µs 697.60 µs]
aggregate/sum_decimal_comet
                        time:   [1.0218 ms 1.0230 ms 1.0242 ms]

Describe the potential solution

No response

Additional context

No response

andygrove commented 1 month ago

Related upstream changes in arrow-rs: https://github.com/apache/arrow-rs/pull/6419