PelicanPlatform / pelican

The Pelican Platform for creating data federations
https://pelicanplatform.org/
Apache License 2.0
9 stars 19 forks source link

Display Prometheus metrics on director's Web UI #370

Open haoming29 opened 9 months ago

haoming29 commented 9 months ago

Based on our talk with Pelican integration team on 11/10/2023. It would be great for the integration team to better debug/monitor Pelican ITB if we can display critical information in our pelican federation which in this case should be the director's Web UI. So we want to include the metrics we currently measure on the director's Web UI:

This should be the follow-up issue of #265

@bbockelm Put this as 7.4 milestone but we might want them be ready by 7.3 depending on how integration tests will go

haoming29 commented 7 months ago

Bump to 7.5 as this ticket is unassigned as of 1/3/2024

haoming29 commented 5 months ago

@bbockelm Would you prefer director admins to use Grafana for visualization or we should build some graphs in-house?

CannonLock commented 4 months ago

@haoming29 Any idea what prometheus metrics to make available?

haoming29 commented 4 months ago

@haoming29 Any idea what prometheus metrics to make available?

@bbockelm any word of wisdom? I could think of the following:

bbockelm commented 4 months ago

I would suggest:

CannonLock commented 2 months ago

@haoming29 How can I find the final two data points?

Can't find anything here -> https://osdf-director.osg-htc.org/metrics

CannonLock commented 2 months ago

@haoming29 Any thoughts on where I can find the final two data points?

Number of bytes transferred from origins. Number of bytes transferred from caches.

haoming29 commented 2 months ago

@haoming29 Any thoughts on where I can find the final two data points?

Number of bytes transferred from origins. Number of bytes transferred from caches.

xrootd_server_bytes with label server_type = origin or cache should gave you a good number. You can delineate by rx and tx for received and transmitted. If this is the number for all origins or caches, then you can do a sum by (server_url) to aggregate all origin/cache servers

CannonLock commented 2 months ago

@haoming29 I cannot see these stats on the metrics page for the director?

https://osdf-director.osg-htc.org/metrics

Any insights?

haoming29 commented 2 months ago

@haoming29 I cannot see these stats on the metrics page for the director?

https://osdf-director.osg-htc.org/metrics

Any insights?

They are not director's metric but the director scrapes these metrics from the origins and caches. You can run PromQL to get the data: https://osdf-director.osg-htc.org/api/v1.0/prometheus/query?query=xrootd_server_bytes{job=%22origin_cache_servers%22}

CannonLock commented 2 months ago

@haoming29 Oh cool, thanks for the explanation. I never considered that it would/could scrape other metric endpoints.

CannonLock commented 1 month ago

Pulling the milestone off because the completion of this depends on Patricks completion of the interface.