LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Apache License 2.0
690 stars 144 forks source link

TaskMetrics and StageMetrics does not extend a common trait #29

Closed RodrigoCSoares closed 4 years ago

RodrigoCSoares commented 4 years ago

TasksMetrics and StageMetrics classes does not extend a common trait and this can cause some trouble while implementing those metrics in a generic way.

An example of a code that does not compile because of that:

val someExternalConfiguration = ExternalConfiguration.read();
val dataframe = spark.sql("SELECT * FROM SOME_WHERE")

someExternalConfiguration match {
    case "stages" => 
        val stagesMetrics = ch.cern.sparkmeasure.StageMetrics(spark)
        Writer.doWrite(dataFrame, stagesMetrics)
    case "tasks" => 
        val tasksMetrics = ch.cern.sparkmeasure.TaskMetrics(spark)
        Writer.doWrite(dataFrame, tasksMetrics)
}

object Writer {
    def doWrite(dataFrame: DataFrame, metrics: <Here should be the common trait>) {
        metrics.runAndMeasure(dataFrame.write.format("parquet").save("/tmp/any_where"))
    }
}