LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Apache License 2.0
690 stars 144 forks source link

Fix Python TaskMetrics.__init__ #42

Closed brian-tecton-ai closed 2 years ago

brian-tecton-ai commented 2 years ago

gatherAccumulable was removed from the constructor from the parallel Scala TaskMetrics class; however, the Python constructor was not updated.

In sparkmeasure pypi 0.21.0 I saw the following error upon running the Python constructor. py4j.Py4JException: Constructor ch.cern.sparkmeasure.TaskMetrics([class org.apache.spark.sql.SparkSession, class java.lang.Boolean]) does not exist

https://github.com/LucaCanali/sparkMeasure/blob/131e52b0d64b9cd018f998913266c16cc88ef2a4/src/main/scala/ch/cern/sparkmeasure/TaskMetrics.scala#L21

I tested this with a local Python build and ensure that I was able to construct TaskMetrics and collect metrics.

brian-tecton-ai commented 2 years ago

Just started checking it out today -- all the docs and example notebooks are super helpful! Ran into this small error, seemed like a quick fix

LucaCanali commented 2 years ago

Merged. Thank you!