Closed Barteus closed 9 months ago
The way metrics end up in Grafana is that Spark jobs push metrics to prometheus-pushgateway, that component is then scraped by prometheus and metrics are finally exposed in grafana.
With the settings spark.metrics.conf.driver.sink.prometheus.period
you specify how frequent you push to pushgateway from sparkjobs. However before that ends up in prometheus/grafana is controlled by the scraping internal of prometheus. Note that the scraping interval of prometheus is 60 seconds by default. You could use prometheus-scrape-config-k8s to custom this value, to match more with the frequency that you push metrics from SparkJobs.
Anyhow, you could also just check that metrics are indeed pushed to pushgateway by accessing its endpoint http://<pushgateway_ip>:9091
. I have checked and even for short jobs (like the one you submitted), metrics are indeed there. But if you don't configure prometheus otherwise, it will take 1minute before those end up in grafana. Of course if SparkJobs processes fails before pushing metrics, you won't have any metric and you can just look at the pod logs.
Hope this helps!
Reproduce
Actual
Spark jobs running less than a minute or failing before the first minute are not visible in the Grafana dashboard or in Prometheus.
Expected
All started jobs are visible in the grafana dashboard.
Versions
Operating system: Ubuntu 22.04.3 LTS
Juju CLI: 3.3.1
Juju agent: 3.3.1
Charm revision:
microk8s:
COS:
cos-configuration-k8s config: