apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

fix: CometScanExec on Spark 3.5.2 #915

Closed Kimahriman closed 1 month ago

Kimahriman commented 2 months ago

Which issue does this PR close?

Closes #912

Rationale for this change

Fixes CometScanExec running on Spark 3.5.2+. Currently it will fail with a runtime exception, and will fail to compile if specifying 3.5.2 with

[ERROR] /Users/abinford/projects/arrow-datafusion-comet/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala:144: value OP_ID_TAG is not a member of object org.apache.spark.sql.catalyst.plans.QueryPlan

This is because OP_ID_TAG is removed in Spark 3.5.2+, and the operator ID tracking is replaced with a separate internal map of plan -> ID, so there's no way to manually pass the ID on to a delegating plan. Instead simply copies the implementation of DataSourceScanExec's method.

What changes are included in this PR?

The only effect of the change is the verbose string output for CometScanExec. Instead of delegating to the underlying DataSourceScanExec, just copied the implementation over. This is in line with other Comet operators that implement their own verbose string, and makes more sense in the formatted explain as the operator names line up.

Before:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)

(1) Scan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-a0b56a3c-caa2-4631-ad15-ec83e8522948/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

After:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)

(1) CometScan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-36667e9a-75e0-4f90-8442-5f47ecd1cf4b/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

How are these changes tested?

Manually verified by building with Spark 3.5.2 ./mvnw clean package -Pspark-3.5 -Dspark.version=3.5.2 -DskipTests.

Kimahriman commented 2 months ago

Oof this breaks a lot of explain plan comparison tests. If this change is ok I can try to work on updating them

kazuyukitanimura commented 2 months ago

Is the change in the metadata? If so, should we fix the metadata instead?

kazuyukitanimura commented 2 months ago

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Kimahriman commented 2 months ago

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Agreed. I thought about doing that here but I wasn't sure the best way to go about updating the diff for the Spark SQL tests

kazuyukitanimura commented 2 months ago

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

I meant CometScanExec.metadata inherited from DataSourceScanExec

andygrove commented 2 months ago

@Kimahriman would you be able to rebase this PR so that we can merge it?

Kimahriman commented 2 months ago

@Kimahriman would you be able to rebase this PR so that we can merge it?

Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again tonight

Kimahriman commented 2 months ago

Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.

Project coverage is 34.05%. Comparing base (fa275f1) to head (2757186). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala 70.58% 0 Missing and 5 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #915 +/- ## ============================================ - Coverage 34.16% 34.05% -0.12% + Complexity 880 879 -1 ============================================ Files 112 112 Lines 43286 43301 +15 Branches 9572 9578 +6 ============================================ - Hits 14789 14745 -44 - Misses 25478 25518 +40 - Partials 3019 3038 +19 ``` | [Flag](https://app.codecov.io/gh/apache/datafusion-comet/pull/915/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [](https://app.codecov.io/gh/apache/datafusion-comet/pull/915/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `34.05% <70.58%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Kimahriman commented 2 months ago

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

andygrove commented 1 month ago

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

Thanks. I am re-running the failed jobs now.