apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
729 stars 143 forks source link

Benchmark and optimize CAST from String to Integer #330

Open andygrove opened 4 months ago

andygrove commented 4 months ago

What is the problem the feature request solves?

https://github.com/apache/datafusion-comet/pull/307 fixes a correctness issue with casting from string to integer, but there is a question about performance in https://github.com/apache/datafusion-comet/pull/307#discussion_r1580451770.

This issue is for benchmarking the native CAST operation versus Spark as well as looking at optimizing the code.

Another area that could be optimized would be to avoid converting a string to a Vec<char> in do_cast_string_to_int:

let chars: Vec<char> = str.chars().collect();

We should be able to just use iterators over the underlying chars but we have to iterate from both start and end of the string, so it isn't trivial.

Describe the potential solution

No response

Additional context

No response

andygrove commented 4 months ago

I plan on working on this once https://github.com/apache/datafusion-comet/pull/307 is merged.

I will write a criterion microbenchmark and compare the current approach with a macro approach, and look into other optimizations.