Open faustus123 opened 1 month ago
Something related to the 1/4-th item: @faustus123 For existing (by Bryan's team) ifarm monitoring, there is a public Grafana dashboard at https://scigraf.jlab.org/. Use the JLab CUE usr/psd to login, then you are available to see the Prometheus metrics of all ifarm nodes. The metrics are collected by node-exporters running on every ifarm compute node.
If we would like to read their metrics, I guess we need to ask for some permission.
The Prometheus DB for ifarm nodes should be on sci-prometheus.jlab.org
.
$ nslookup sci-prometheus.jlab.org
Server: 129.57.90.255
Address: 129.57.90.255#53
Name: sci-prometheus.jlab.org Address: 129.57.16.202
Do have permission to write to that DB for things that are not directly related to SciComp operations?
Do have permission to write to that DB for things that are not directly related to SciComp operations?
I do not think we have write permission. I think we do not have read permission either. After some negotiation, Bryan might give you read permission but he should not be happy if we port our own stuff to it.
I will see if I can run a dockerized/podmanrized Prometheus image on one of the ifarm login nodes.
Do have permission to write to that DB for things that are not directly related to SciComp operations?
I do not think we have write permission. I think we do not have read permission either. After some negotiation, Bryan might give you read permission but he should not be happy if we port our own stuff to it.
I will see if I can run a dockerized/podmanrized Prometheus image on one of the ifarm login nodes.
We may have read permission:
[xmei@ifarm2401 ~]$ curl 129.57.16.202:9090
<a href="/prometheus">Found</a>.
NOTE: only certain range of TCP ports is allowed on scicomp farm for inter-compute node communication (no login-compute communication). The port range is typed in Teams chat for security issue.
NOTE: only certain range of TCP ports is allowed on ifarm for inter-compute node communication (no login-compute communication). The port range is typed in Teams chat for security issue.
If this is only for nodes within the SciComp farm proper (i.e. excluding ifarm nodes) then I am confused why it would be considered a security issue. Regardless, if it is not a publicly know thing, they could change it at any point, breaking whatever system we have set up. We should press them for more info or for a suggestion on how we properly can get RPC for monitoring purposes.