GSI-HPC / lustre_exporter

Prometheus exporter for use with the Lustre parallel filesystem
GNU General Public License v3.0
18 stars 20 forks source link
lustre lustre-exporter metrics prometheus-exporter

Lustre Metrics Exporter

Continuous Integration

Prometheus exporter for Lustre metrics.

Version Support

Lustre Version Exporter Tag
2.12 v2.1.6

Getting

Clone the repository.

Building

Exporter

For just building the exporter: make build

Building the exporter with code testing, formatting and linting: make

Required for build with code linting: go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest

RPM Package Build

A CentOS7 RPM package can be build by using a

  1. Shell build script in rpm/build.sh
  2. Docker build container, see section below.

Docker - Build Container

Binary Build Container

A Debian container is based on the offical golang:1.17.5-bullseye container image for just providing the binary.

# from repo base dir run
sudo docker build  --tag lustre_exporter -f docker/Dockerfile .
sudo docker run --rm -v $PWD:/cpy -it lustre_exporter

The binary will be available in the repositories build directory.

RPM Build Container

A CentOS7 container is based on the official CentOS7 container image for providing the exporter in a RPM package.

# from repo base dir run
sudo docker build -t rpm_dock -f docker/RPM-Dockerfile .
sudo docker run --rm -v $PWD:/rpm -it rpm_dock

The RPM package will be available in the repositories build directory.

Running

./lustre_exporter <flags>

Flags

All above flags default to the value "extended" when no argument is submitted by the user.

Example: ./lustre_exporter --collector.ost=disabled --collector.mdt=core --collector.mgs=extended

The above example will result in a running instance of the Lustre Exporter with the following statuses:

Flag Option Detailed Description

What's exported?

All Lustre procfs and procsys data from all nodes running the Lustre Exporter that we perceive as valuable data is exported or can be added to be exported (we don't have any known major gaps that anyone cares about, so if you see something missing, please file an issue!).

See the issues tab for all known issues.

Exporting LNET Metrics

Since the LNET metrics are currently exported based on the DebugFS it is required to have root priviliges or mount DebugFS to be access by different user.

To export that metrics running the exporter as Systemd service as root user, the default user option prometheus must be changed in the service file.

Grafana Dashboards

Example dashboards for Grafana are provided under the dashboards directroy.

It is highly recommended to also install the Prometheus Node Exporter to get a complete view within the provided dashboards.

Troubleshooting

In the event that you encounter issues with specific metrics (especially on versions of Lustre older than 2.12), please try disabling those specific troublesome metrics using the documented collector flags in the 'disabled' or 'core' state. Users have encountered bugs within Lustre where specific sysfs and procfs files miscommunicate their sizes, causing read calls to fail.

Contributing

You are welcome to contribute to the project. Feel free to create an issue, pull request or just start a discussion.