JeffersonLab / SRO-RTDP

1 stars 0 forks source link

Establish Prometheus server and test #49

Open faustus123 opened 1 month ago

faustus123 commented 1 month ago
cissieAB commented 2 weeks ago

Something related to the 1/4-th item: @faustus123 For existing (by Bryan's team) ifarm monitoring, there is a public Grafana dashboard at https://scigraf.jlab.org/. Use the JLab CUE usr/psd to login, then you are available to see the Prometheus metrics of all ifarm nodes. The metrics are collected by node-exporters running on every ifarm compute node. image

If we would like to read their metrics, I guess we need to ask for some permission.

cissieAB commented 2 weeks ago

The Prometheus DB for ifarm nodes should be on sci-prometheus.jlab.org.

$ nslookup sci-prometheus.jlab.org
Server:     129.57.90.255
Address:    129.57.90.255#53

Name: sci-prometheus.jlab.org Address: 129.57.16.202

faustus123 commented 2 weeks ago

Do have permission to write to that DB for things that are not directly related to SciComp operations?

cissieAB commented 2 weeks ago

Do have permission to write to that DB for things that are not directly related to SciComp operations?

I do not think we have write permission. I think we do not have read permission either. After some negotiation, Bryan might give you read permission but he should not be happy if we port our own stuff to it.

I will see if I can run a dockerized/podmanrized Prometheus image on one of the ifarm login nodes.

cissieAB commented 2 weeks ago

Do have permission to write to that DB for things that are not directly related to SciComp operations?

I do not think we have write permission. I think we do not have read permission either. After some negotiation, Bryan might give you read permission but he should not be happy if we port our own stuff to it.

I will see if I can run a dockerized/podmanrized Prometheus image on one of the ifarm login nodes.

We may have read permission:

[xmei@ifarm2401 ~]$ curl 129.57.16.202:9090
<a href="/prometheus">Found</a>.
cissieAB commented 5 days ago

NOTE: only certain range of TCP ports is allowed on scicomp farm for inter-compute node communication (no login-compute communication). The port range is typed in Teams chat for security issue.

faustus123 commented 4 days ago

NOTE: only certain range of TCP ports is allowed on ifarm for inter-compute node communication (no login-compute communication). The port range is typed in Teams chat for security issue.

If this is only for nodes within the SciComp farm proper (i.e. excluding ifarm nodes) then I am confused why it would be considered a security issue. Regardless, if it is not a publicly know thing, they could change it at any point, breaking whatever system we have set up. We should press them for more info or for a suggestion on how we properly can get RPC for monitoring purposes.