Apptainer connects Prometheus. A redesigned Prometheus Pushgateway collecting cgroup metrics.
Prometheus is an open source metrics collections and monitoring tool that is widely adopted.
Note: Promethues only supports pull model, meaning that Promethues will regularly (
scrape_interval = x
) pull data from metrics sources. If users want to push data to Prometheus, then metric cache components, such as Pushgateway, is needed. See https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/.
Pushgateway acts as a bridge (metric caches, metric sources) to Prometheus targeting at the support of both push and pull metrics. For those short-lived jobs
or jobs that can not expose metrics themselves
. Pushgateway provides an easy way (http rest endpoints) to receive metrics from such jobs, at the same time, Prometheus can pull metrics data from Pushgateway and use it as the metric source.
Pushgateway acts a bit of similar to Prometheus exporters https://prometheus.io/docs/instrumenting/exporters/, but Pushgateway is more general and can receive pushed metrics, while exporters are more specialized and do not support pushing metrics.
When to use Pushgateway (https://prometheus.io/docs/practices/pushing/)
Referenced from https://www.devopsschool.com/blog/what-is-prometheus-and-how-it-works/
When we are thinking to collect metrics from Apptainer, several requirements should be satisfied:
To collect Apptainer containers stats data, for each created container the starter (starter-suid) process should be put into a newly created sub cgroup so that cgroup stats can be collected and visualized.
Note that this tool can be used for monitoring any programs, this tool comes from the development of one Apptainer RFE.
GET /metrics
Note that Apptheus should be started with privileges, which means the unix socket created by Apptheus is also privileged, so during the implementation, the permission of this newly created unix socket is changed to
0o777
, that is also the reason why we need to do additional security check, i.e., checking whether the program is trusted.
/metrics
endpoint to Prometheus.--trust.path
options. While Pushgateway use http tls.https://github.com/apptainer/apptheus/assets/2051711/b33c5f20-a030-4b91-a6a7-bc62fe1fc6b8
--socket.path="/run/apptheus/gateway.sock"
, local socket path for verification. Default value is /run/apptheus/gateway.sock
.--trust.path=""
, multiple trusted program paths separated using ';', for exmaple, for apptainer starter, the path usually is /usr/local/libexec/apptainer/bin/starter
.--monitor.inverval=0.5s
, cgroup stat sample interval.