Better to have the IO metrics for non-hdfs type such as S3 Storage

LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

Apache License 2.0

706 stars 145 forks source link

Hi @jack1981, I believe your question fits better in the context of spark-dashboard implementation with the Spark metrics system, as described in https://github.com/LucaCanali/Miscellaneous/tree/master/Spark_Dashboard and in https://github.com/cerndb/spark-dashboard In that case, I'd liek to share that I have been working on extensions of Spark monitoring to cover S3A and other I/O and OS metrics, for the case of Spark 3.0, please see the work https://github.com/cerndb/SparkPlugins I'll be interested in collecting the feedback. Best, L.

LucaCanali / sparkMeasure

Better to have the IO metrics for non-hdfs type such as S3 Storage #31