cerndb / spark-dashboard

Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
Apache License 2.0
112 stars 22 forks source link

definition of 'task run time' #4

Open cometta opened 4 months ago

cometta commented 4 months ago

i triggered a spark job for a few minutes, but i see from Grafana dashboard showing 'task run time' of at least one hour. is the info on Grafana correct?

image

LucaCanali commented 4 months ago

It seems accurate and aligns with my own experience as well. The metrics you're observing are cumulative across all executed tasks and all executors. The main idea behind using Apache Spark is to parallelize execution, allowing multiple CPUs/tasks to work simultaneously. Additionally, the metric "Number of Active Tasks" indicates how many tasks are being executed in parallel.

cometta commented 4 months ago

for the case when i only need to know total duration = end time - start time of Spark job. any widget i can refer to?

LucaCanali commented 4 months ago

That's the kind of basic information you can easily get from the Spark Web UI, see https://spark.apache.org/docs/latest/web-ui.html