This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Apache License 2.0
706
stars
145
forks
source link
Peak Memory usage - PySpark 3 on Azure Synapse #40
Found an odd issue. We recently started running our jobs through Azure Synapse. While using Azure HDI, we were able to record the peakExecutionMemory, but for some reason with Azure Synapse, all the values are 0.
We are using TaskMetrics to get the most information out of the run and within the csv generated, other columns are populated except for the peakExecutionMemory which are all 0.
Is this a known issue?
We are running with python 3.7, pyspark 3.2.1, and scala 2.12 and using the spark-measure_2.12:0.18.jar
Found an odd issue. We recently started running our jobs through Azure Synapse. While using Azure HDI, we were able to record the peakExecutionMemory, but for some reason with Azure Synapse, all the values are 0.
We are using TaskMetrics to get the most information out of the run and within the csv generated, other columns are populated except for the peakExecutionMemory which are all 0.
Is this a known issue?
We are running with python 3.7, pyspark 3.2.1, and scala 2.12 and using the spark-measure_2.12:0.18.jar