apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
547 stars 105 forks source link

build: Run Spark SQL tests for 3.4 #166

Closed sunchao closed 2 months ago

sunchao commented 2 months ago

Which issue does this PR close?

Closes #8.

Rationale for this change

We want to leverage SQL tests in Spark itself to increase our test coverage. We should run them with Comet enabled and make sure the tests pass all the existing checks.

What changes are included in this PR?

This PR enables us to run SQL tests in Spark itself in Comet. To enable Comet for Spark, a diff file is introduced to patch Spark (version 3.4.2) so that we can:

How are these changes tested?

codecov-commenter commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

:exclamation: No coverage uploaded for pull request base (main@72398a6). Click here to learn what that means.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #166 +/- ## ======================================= Coverage ? 33.30% Complexity ? 767 ======================================= Files ? 107 Lines ? 35372 Branches ? 7657 ======================================= Hits ? 11782 Misses ? 21144 Partials ? 2446 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

advancedxy commented 2 months ago

❗ No coverage uploaded for pull request base (main@72398a6).

@sunchao you may need to rebase the latest main to compare coverage difference.

sunchao commented 2 months ago

@sunchao you may need to rebase the latest main to compare coverage difference.

Oh thanks for letting me know. I'll address the test failures first, and then do the rebasing.

sunchao commented 2 months ago

cc @viirya @advancedxy @kazuyukitanimura this is ready for review now.

I'll create a separate Github issue to track the work of creating a branch for the diff changes.

sunchao commented 2 months ago

I applied the diff change to my own fork of Spark 3.4.2 here to make it easier to review.

advancedxy commented 2 months ago

I left some comments on this and https://github.com/sunchao/spark/commit/f7c15aa4a9c0462f17a8994102a1c707b07629d6. LGTM generally.

sunchao commented 2 months ago

@advancedxy there are still some issues when enabling shuffle in Spark SQL tests. I'll address them separately later in a follow-up. Let me know what you think of the latest change.

sunchao commented 2 months ago

(the CI failure is unrelated)

advancedxy commented 2 months ago

I'll address them separately later in a follow-up. Let me know what you think of the latest change.

That sounds good to me. I took anther look, only one minor comment, LGTM otherwise.

sunchao commented 2 months ago

Thanks, merged