apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

feat: support array_append #1072

Closed NoeB closed 1 week ago

NoeB commented 1 week ago

Which issue does this PR close?

Related to #1042: Adds support for array_append operation

Rationale for this change

What changes are included in this PR?

How are these changes tested?

NoeB commented 1 week ago

I am unsure if I implemented the operation correctly, so I just implemented the array_append operation as a first step. If the approach is correct, I think I can extract it to a function and add the support for the other array operations as well (for which DataFusion already has support)

codecov-commenter commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 34.12%. Comparing base (845b654) to head (1715e49). Report is 14 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1072 +/- ## ============================================ - Coverage 34.46% 34.12% -0.35% + Complexity 888 883 -5 ============================================ Files 113 113 Lines 43580 42832 -748 Branches 9658 9367 -291 ============================================ - Hits 15021 14617 -404 + Misses 25507 25352 -155 + Partials 3052 2863 -189 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

NoeB commented 1 week ago

Because array_append has been added in Spark 3.4 I had to rewrite the match statement in QueryPlanSerde because it would not compile with spark 3.3 the same thing applies to the test which I rewrote to spark.sql because the test would also not compile with spark3.3. Let me know if the approach is ok or if you have a different option in mind. I think a lot of the array functions have been added in spark 3.4 and spark3.5 so a lot more cases like this will exist in the future. Regarding the Spark 4.0 failures I could not yet check them because the tests do not seem to start on my machine with spark 4.0. I am working to fix that.

NoeB commented 1 week ago

Thank you @andygrove for the review I have updated the code to match the refactor you did yesterday I guess that's also the reason for the failing tests.

NoeB commented 1 week ago

It seems that Spark 4.0 rewrites array_append into an array_insert expression, therefor the test is disabled for spark 4.0+. see https://github.com/apache/spark/blob/f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1758.