Open neilbest-db opened 2 days ago
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Added row counts and timings to second set of comparison runs to table in description. ☝️
enable auto-optimized shuffle for module 2011
originally implemented for Spark 3.1.2 in commit https://github.com/databrickslabs/overwatch/commit/d751d5fc75c939892b73f877cb0e5542eb2cc030 on branch
1228-silver-job-runs-spark312-r0812
as part of #1253.This PR removes all of the new utilities and transformation refactoring that were only aids to development and testing. They did not impact performance in any significant way.
The essential change brought to this branch (
1228-optimization-only
) is entirely expressed in commit https://github.com/databrickslabs/overwatch/commit/8c9ee79d20a4904ecd5aa2908715179c58e615e1. The new code introduced is here: https://github.com/databrickslabs/overwatch/blob/8c9ee79d20a4904ecd5aa2908715179c58e615e1/src/main/scala/com/databricks/labs/overwatch/pipeline/Silver.scala#L271-L274The background and analysis of the optimization presented in the description of #1253 is still representative of the performance improvements realized by this change.
proof notebook (IN PROGRESS)
Corresponding job runs for before/after comparison of this change:
0.8.1.2
0.8.2.0-SNAPSHOT