LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Apache License 2.0
690 stars 144 forks source link

Notable difference to REST API #30

Closed mansenfranzen closed 4 years ago

mansenfranzen commented 4 years ago

Hi Luca and community,

first of all - thanks for the great work - very much appreciate it!

This is not a real issue but rather an user question asking for clarification. I've been wondering if sparkMeasure provides any additional metrics than the default REST API?

I would like to collect spark job metrics while keeping any dependencies as minimal as possible. Using the default REST API to collect the metrics seems simple without needing to rely on an additional package. Of course sparkMeasure provides additional abstractions to aggregate on stage/task level and to compute many relevant metrics. That is of great use. We are likely to be interested in only a few core metrics and we don't need all of them.

LucaCanali commented 4 years ago

Hi, Thanks for the interest in sparkMeasure. The REST API makes available all the metrics (at task level, stage level, and also executor metrics, see monitoring doc). Also the event log file (if enabled) has a very large selection of metrics dumped into JSON, that you can use with external tools. Tools based on Spark listeners, like sparkMeasure, as you write, are mostly intended for convenience, customization, metrics aggregation, and, I add, also for potentially reducing the performance impact of collecting and processing metrics obtained with the REST API.

mansenfranzen commented 4 years ago

Thanks for your response. Issue can be closed.