apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
447 stars 100 forks source link

feat: Implement Spark unhex #342

Closed tshauck closed 1 week ago

tshauck commented 2 weeks ago

Which issue does this PR close?

Closes #341

Rationale for this change

unhex is currently unsupported by comet. This my first PR into this repo, so certainly open to any feedback to make it more inline w/ expectations.

What changes are included in this PR?

Add unhex as well as make some minor refactors.

How are these changes tested?

Added simple tests to the rust and spark sql side of the code.

tshauck commented 2 weeks ago

Hi, I updated the shim handling for 3.2 and made various other updates (based on PR feedback and general cleanup). Please have another look and let me know what you think, thanks!

tshauck commented 2 weeks ago

Thanks for all the feedback. I think I've addressed the build/naming/etc feedback, and will have a look at improving the tests and any associated implementation changes sometime tomorrow. I'll request a review via GH when it's ready.

tshauck commented 1 week ago

I think this is ready for review. I updated the unhex impl to be more faithful to Spark's (for odd-length inputs in particular), added better null handling, and added more tests from the Spark repo.

tshauck commented 1 week ago

Err... looks to be an issue w/ spark 3.2 I'll need to look into. Hopefully the majority of the code'll remain unchanged.

image
andygrove commented 1 week ago

Thanks for the updates @tshauck. I plan on reviewing later today.

andygrove commented 1 week ago

@viirya @kazuyukitanimura do you have any additional feedback?

andygrove commented 1 week ago

I plan on merging this tomorrow if there is no more feedback