Closed Kimahriman closed 1 month ago
Oof this breaks a lot of explain plan comparison tests. If this change is ok I can try to work on updating them
Is the change in the metadata
? If so, should we fix the metadata instead?
And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)
Is the change in the
metadata
? If so, should we fix the metadata instead?
Are you referring to the compilation error or the change in explain output?
And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)
Agreed. I thought about doing that here but I wasn't sure the best way to go about updating the diff for the Spark SQL tests
Is the change in the
metadata
? If so, should we fix the metadata instead?Are you referring to the compilation error or the change in explain output?
I meant CometScanExec.metadata
inherited from DataSourceScanExec
@Kimahriman would you be able to rebase this PR so that we can merge it?
@Kimahriman would you be able to rebase this PR so that we can merge it?
Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again tonight
Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI
Attention: Patch coverage is 70.58824%
with 5 lines
in your changes missing coverage. Please review.
Project coverage is 34.05%. Comparing base (
fa275f1
) to head (2757186
). Report is 1 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
...ala/org/apache/spark/sql/comet/CometScanExec.scala | 70.58% | 0 Missing and 5 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests
Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests
Thanks. I am re-running the failed jobs now.
Which issue does this PR close?
Closes #912
Rationale for this change
Fixes CometScanExec running on Spark 3.5.2+. Currently it will fail with a runtime exception, and will fail to compile if specifying 3.5.2 with
This is because
OP_ID_TAG
is removed in Spark 3.5.2+, and the operator ID tracking is replaced with a separate internal map of plan -> ID, so there's no way to manually pass the ID on to a delegating plan. Instead simply copies the implementation ofDataSourceScanExec
's method.What changes are included in this PR?
The only effect of the change is the verbose string output for CometScanExec. Instead of delegating to the underlying DataSourceScanExec, just copied the implementation over. This is in line with other Comet operators that implement their own verbose string, and makes more sense in the formatted explain as the operator names line up.
Before:
After:
How are these changes tested?
Manually verified by building with Spark 3.5.2
./mvnw clean package -Pspark-3.5 -Dspark.version=3.5.2 -DskipTests
.