This PR adds support for Photon-specific SQL metrics as:
cumulative time metric that can be used as a replacement for the scan time metric.
shuffle write time can be reconstructed using the following metrics:
time taken waiting on file write IO (part of shuffle file write)
time taken to sort rows by partition ID (part of shuffle file write)
time taken to convert columns to rows (part of shuffle file write)
peak memory usage metric can be used for the peak execution memory metric.
Code Changes
AppSparkMetricsAnalyzer.scala: Added logic to handle Photon-specific metrics for peak memory and shuffle write time, utilizing accumulators instead of task metrics. [1][2][3]
Fixes #1388
This PR adds support for Photon-specific SQL metrics as:
cumulative time
metric that can be used as a replacement for thescan time
metric.shuffle write time
can be reconstructed using the following metrics:peak memory usage
metric can be used for thepeak execution memory
metric.Code Changes
AppSparkMetricsAnalyzer.scala
: Added logic to handle Photon-specific metrics for peak memory and shuffle write time, utilizing accumulators instead of task metrics. [1] [2] [3]DatabricksParseHelper.scala
: Introduced methods and constants to identify and process Photon I/O metrics. [1] [2]DataSourceView.scala
: Updated the data source view to include Photon-specific I/O metrics and refactored the method to fetch I/O metrics. [1] [2] [3]AccumManager.scala
: Added a utility method to apply functions to entries in theaccumInfoMap
.Tests: