Closed parthchandra closed 1 month ago
Looking at the ci failures, will address the comments as well.
@andygrove I changed the core of the implementation. Instead of setting information in a CometExplainInfo structure and bubbling it up in the plan, I now set the explain information as a spark tag in the plan or expression. This makes the changes easier to implement as we add support for more operators and expressions. Please take a look.
@advancedxy, @andygrove Addressed your comments. I was missing a few paths where the explanation was not being propagated correctly. I've fixed the ones I could find. Also, updated the results in CometTPC*QueriesList. I had run this previously on a 1TB dataset and not run these after I rebased on main. I've run it again on a 1GB dataset but had to explicitly set some Bloom filter configuration options to simulate the conditions of the larger data set. Otherwise the settings are the same as the defaults we expect most users to have.
Looking into the ci failures ...
@andygrove could you please trigger a re-run of the failed ci workflow? (Only committers can do that). I cannot reproduce the failure locally and am pretty sure it has nothing to do with this PR.
Re-triggered.
Could we merge this please? @viirya @andygrove
There is conflict needed to be resolved. And I have a few minor comments like code comment changes.
could you please trigger a re-run of the failed ci workflow? (Only committers can do that). I cannot reproduce the failure locally and am pretty sure it has nothing to do with this PR.
BTW, you can close and re-open the PR to re-trigger the CI workflow. Or you can push an empty commit to re-trigger.
Thank you @parthchandra. I will merge this once CI passes.
Thanks for the review @viirya @andygrove @advancedxy
Merged. Thanks all.
Which issue does this PR close?
Closes #253
Rationale for this change
Adds addition planning information to a Spark plan which allows us to track the reasons why a Spark plan was not fully converted to Comet.
What changes are included in this PR?
This PR adds a
CometExplainInfo
structure that is returned by every step of the planning process. Eventually the Structure is attached to the Spark plan being returned by adding it as a Comet specific tagHow are these changes tested?
Additional unit test and tested with all TPCH and TPCDS queries with both Spark 3.4 and Spark 3.4 with the extended plan support. Also regenerated the
CometTPC*QueriesList
output for both TPCH and TPCDS run against a 1TB dataset. Note: Since the extended plan support does not exist before Spark 4, the output ofCometTPC*QueriesList
was updated with a build of Spark 3.4 with the feature backported.