Peak Memory usage - PySpark 3 on Azure Synapse

LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

Apache License 2.0

706 stars 145 forks source link

Found an odd issue. We recently started running our jobs through Azure Synapse. While using Azure HDI, we were able to record the peakExecutionMemory, but for some reason with Azure Synapse, all the values are 0.

We are using TaskMetrics to get the most information out of the run and within the csv generated, other columns are populated except for the peakExecutionMemory which are all 0.

Is this a known issue?

We are running with python 3.7, pyspark 3.2.1, and scala 2.12 and using the spark-measure_2.12:0.18.jar

LucaCanali / sparkMeasure

Peak Memory usage - PySpark 3 on Azure Synapse #40