apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
615 stars 113 forks source link

feat: Add CometRowToColumnar operator #206

Closed advancedxy closed 2 months ago

advancedxy commented 3 months ago

Which issue does this PR close?

This closes #119 and partially resolves #137

Rationale for this change

For ease testing with RangeExec operator in the short term. In the long term, this PR introduce a general way to enable Comet with row-based source exec nodes

What changes are included in this PR?

  1. Introduce CometRowToColumnarExec to transform Spark's InternalRow into ColumnarBatch
  2. CometArrowConverters and its corresponding ArrowWriters
  3. Glue code to apply RowToColumnar transition

How are these changes tested?

  1. added tests and existing test code
  2. verify plan transition in the Spark UI
codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 15.70513% with 263 lines in your changes are missing coverage. Please review.

Project coverage is 33.36%. Comparing base (aa6ddc5) to head (fb49a88). Report is 1 commits behind head on main.

Files Patch % Lines
...spark/sql/comet/execution/arrow/ArrowWriters.scala 0.00% 198 Missing :warning:
...l/comet/execution/arrow/CometArrowConverters.scala 0.00% 42 Missing :warning:
.../scala/org/apache/spark/sql/comet/util/Utils.scala 0.00% 8 Missing :warning:
...ain/scala/org/apache/comet/vector/NativeUtil.scala 0.00% 6 Missing :warning:
...org/apache/comet/CometSparkSessionExtensions.scala 78.57% 0 Missing and 3 partials :warning:
...pache/spark/sql/comet/CometRowToColumnarExec.scala 90.32% 1 Missing and 2 partials :warning:
...n/scala/org/apache/spark/sql/comet/operators.scala 33.33% 1 Missing and 1 partial :warning:
...n/scala/org/apache/comet/vector/StreamReader.scala 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #206 +/- ## ============================================ - Coverage 33.48% 33.36% -0.13% - Complexity 776 791 +15 ============================================ Files 108 111 +3 Lines 37178 37479 +301 Branches 8146 8192 +46 ============================================ + Hits 12448 12503 +55 - Misses 22107 22351 +244 - Partials 2623 2625 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

advancedxy commented 2 months ago

Gently ping @sunchao and @viirya

sunchao commented 2 months ago

Sorry for the delay @advancedxy . I'll try to take a look soon.

advancedxy commented 2 months ago

@sunchao would you mind to take a look at this again, I should address most of your comments, please let me know if you have any other comments.

And sorry for the late update, I wasn't feeling well last week.

sunchao commented 2 months ago

Merged, thanks @advancedxy ! If any comments from @viirya are not addressed, we can do it in a separate PR. This PR has been open for too long :)

advancedxy commented 2 months ago

If any comments from @viirya are not addressed, we can do it in a separate PR.

Of course.

Thanks for @sunchao and @viirya's review, really appreciate that.