blemmenes / radosgw_usage_exporter

Prometheus exporter for scraping Ceph RADOSGW usage data.
MIT License
52 stars 39 forks source link
ceph prometheus-exporter python radosgw

Ceph RADOSGW Usage Exporter

Prometheus exporter that scrapes Ceph RADOSGW usage information (operations and buckets). This information is gathered from a RADOSGW using the Admin Operations API.

This exporter was based off from both (https://www.robustperception.io/writing-a-jenkins-exporter-in-python/) and the more elaborate Jenkins exporter here (https://github.com/lovoo/jenkins_exporter).

Requirements

rgw enable usage log = true
rgw usage log flush threshold = 1024
rgw usage log tick interval = 30
rgw usage max shards = 32
rgw usage max user shards = 8
rgw admin entry = "admin"
rgw enable apis = "s3, admin"
    "caps": [
        {
            "type": "buckets",
            "perm": "read"
        },
        {
            "type": "metadata",
            "perm": "read"
        },
        {
            "type": "usage",
            "perm": "read"
        },
        {
            "type": "users",
            "perm": "read"
        }

Note: If using a loadbalancer in front of your RADOSGWs, please make sure your timeouts are set appropriately as clusters with a large number of buckets, or large number of users+buckets could cause the usage query to exceed the loadbalancer timeout.

For haproxy the timeout in question is timeout server

Local Installation

git clone git@github.com:blemmenes/radosgw_usage_exporter.git
cd radosgw_usage_exporter
pip install requirements.txt

Config

Arg Env Description Default
-H --host RADOSGW_SERVER Server URL for the RADOSGW api (example: http://objects.dreamhost.com/) http://radosgw:80
-e --admin-entry ADMIN_ENTRY The entry point for an admin request URL admin
-a --access-key ACCESS_KEY S3 access key NA
-s --secret-key SECRET_KEY S3 secret key NA
-k --insecure Allow insecure server connections when using SSL false
-p --port VIRTUAL_PORT Port to listen 9242
-S --store STORE Store name added to metrics us-east-1
-t --timeout TIMEOUT Timeout when getting metrics 60
-l --log-level LOG_LEVEL Provide logging level: DEBUG, INFO, WARNING, ERROR or CRITICAL INFO

Example

./check_ceph_rgw_api -H https://objects.dreamhost.com/ -a JXUABTZZYHAFLCMF9VYV -s jjP8RDD0R156atS6ACSy2vNdJLdEPM0TJQ5jD1pw

Docker

Docker build (https://github.com/pando85/radosgw_usage_exporter/pkgs/container/radosgw_usage_exporter):

docker run -d -p 9242 ghcr.io/pando85/radosgw_usage_exporter:latest \
-H <RADOSGW HOST> -a <ACCESS_KEY> -s <SECRET_KEY> -p 9242

Arguments can also be specified by environment variables as well.

docker run -d -p 9242:9242 \
-e "RADOSGW_SERVER=<host>" \
-e "VIRTUAL_PORT=9242" \
-e "ACCESS_KEY=<access_key>" \
-e "SECRET_KEY=<secret_key>" \
ghcr.io/pando85/radosgw_usage_exporter:latest

Resulting metrics can be then retrieved via your Prometheus server via the http://<exporter host>:9242/metrics endpoint.

Kubernetes

You can find an example of deployment using Rook operator in a K8s environment in examples/k8s directory.