Open neilbest-db opened 1 week ago
0820_release
should be moved to the tip of main
before completing this PR, therefore I am leaving it in draft status for now.
BTW, there will be merge conflicts. I am perfectly willing to help with those. I made this mess! LMK.
@sriram251-code, the code that changes the values of the Spark UI Job Group IDs has been commented out in this branch per your recommendation (see https://github.com/databrickslabs/overwatch/pull/1253/commits/9fe9f8cedfc1bd988d5c56f44f1ffb55b3536935). I would like to understand the scenarios when the Job Group IDs are the only place to extract certain tokens/IDs. Is it possible to enumerate these scenarios and map the flow of those tokens through the ETL to the target table(s)?
Issues
10 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
closes #1228
please do not merge this into 0820_release
until #1223 is closed per this comment there.
Introduction
A typical run of Overwatch module 2011 (
JobsRuns_Silver
, "JR") using release 0.8.1.2 to process 10 days of data in workspacee2-demo-west-ws
produces this console output:The corresponding run of the code from this feature branch (
1228-silver-job-runs-spark312-r0812
) in an otherwise identical environment against the same upstream data goes like this:This ~67% reduction in runtime minutes is characteristic of the changes introduced here. Below are some details on how this was achieved, but first a little background.
Background
Databricks AQE shuffle auto-optimization
As part of Overwatch release 0.7.1.1, #675 introduced the following Spark configuration globally for all modules:
https://github.com/databrickslabs/overwatch/blob/ae56406ea5aef1e1e8a40b3653a9e6df70922458/src/main/scala/com/databricks/labs/overwatch/utils/SparkSessionWrapper.scala#L20
This is a component of a larger set of features called Adaptive Query Execution (AQE) that was released with Databricks Runtime (DBR) 7.0 and Spark 3.0.0. See this Databricks Engineering Blog article for more details on AQE in general.
Soon thereafter performance degradation in certain Overwatch modules was reported in https://github.com/databrickslabs/overwatch/issues/794#issue-1607557972:
. . . and the
autoOptimizeShuffle
feature was disabled in #790 for release 0.7.1.2:https://github.com/databrickslabs/overwatch/blob/f9c8dd088ea0e81d4a311368a5c47b1a4f9ad375/src/main/scala/com/databricks/labs/overwatch/utils/SparkSessionWrapper.scala#L20
JR shuffle factor
Prior to the AQE shuffle auto-optimization described in the previous section, Overwatch release 0.7.1.0 (#563, #661) added the
shuffleFactor
parameter to the definition ofjobRunsModule
:https://github.com/databrickslabs/overwatch/blob/bc136594b1e9d08c77339d6a31f7c87bfe2e7202/src/main/scala/com/databricks/labs/overwatch/pipeline/Silver.scala#L246
Based on this
shuffleFactor
value of12.0
and the job cluster configuration used for these tests,Module.optimizeShufflePartitions()
currently setsspark.sql.shuffle.partitions
to9600
:https://github.com/databrickslabs/overwatch/blob/3f4cdca4d8438550b13c38fec43eea60bdbee877/src/main/scala/com/databricks/labs/overwatch/pipeline/Module.scala#L79-L102
Results
With AQE shuffle auto-optimization disabled in 0.8.1.2, many Spark stages for JR are forced to run 9600 tasks or more. This results in extended stage durations (grey verticals are minutes) and additional executors, (i.e. cluster auto-scaling, blue verticals) in the following chart from the Spark UI:
This PR introduced the following change in the Silver JR module's configuration in commit https://github.com/databrickslabs/overwatch/commit/d751d5fc75c939892b73f877cb0e5542eb2cc030: https://github.com/databrickslabs/overwatch/blob/d751d5fc75c939892b73f877cb0e5542eb2cc030/src/main/scala/com/databricks/labs/overwatch/pipeline/Silver.scala#L271-L274
Running 0.8.1.2 with the enhancements in this PR, not only are the corresponding Spark job durations decreased but recruitment of additional executors is deferred to later phases of the module:
Note that differences in Spark job labels in these screenshots is due to other provisional features unrelated to Spark configuration or performance that were included in the Overwatch snapshot JAR used for these test runs. Those special labels are aids to development, testing, and troubleshooting to be introduced in #1223. Despite other minor refactoring in the JR module logic the overall contours of the progression of Spark jobs is recognizable because these runs were restricted to only this module for the same date range. Spark job 64 from 20:28 until 20:36 in the upper timeline corresponds to
jobRunsAppendMeta
(label truncated) at 20:23, for example. This phase in particular seems to be where the overall duration of the module has been decreased most significantly.Closer examination of the task statistics for corresponding jobs shows that many tasks created in the absence of shuffle auto-optimization are effectively doing nothing!
With shuffle auto-optimization relaxed we see a much more favorable distribution of task durations and bytes processed:
This trend is similarly apparent in other phases of the JR module.
Conclusion
This PR has produced demonstrable performance improvements for the following dataframe transformations:
SilverTransforms.filterAuditLog
WorkflowsTransforms.jobRunsDeriveRunsBase
SilverTransforms. jobRunsAppendJobMeta
Next steps
Further gains in resource utilization and time efficiency may be possible in the subsequent phases of the JR module (2011):
WorkflowsTransforms.jobRunsStructifyLookupMeta
WorkflowsTransforms. jobRunsCleanseCreatedNestedStructures