cloudflare / ebpf_exporter

Prometheus exporter for custom eBPF metrics
MIT License
2.2k stars 240 forks source link

Suspected off-by-one error when bucketing histogram values #488

Open hoel-zr-o opened 2 weeks ago

hoel-zr-o commented 2 weeks ago

Hello!

I think I may have found an off-by-one error in the various histogram macros provided by maps.bpf.h - the tl;dr is that when I record a value in an exp2 histogram, I see that value reflected in the bucket below or equal to that value, rather than the first bucket above to that value.

For example, let's say I have buckets for 1, 2, 4, 8, ..., 1024, and I record the value 268. I would expect the buckets corresponding to le="5.12e-07" and le="1.024e-06" to have their values incremented by one - but I am also seeing the bucket for le="2.56e-07" get incremented.

Minimal Example

Here is a fairly minimal example that demonstrates this by just inserting the values 1-8 into an exp2 histogram (I used tp/sched/sched_process_exec as a proxy for an event that will happen almost immediately upon startup - my BPF knowledge is mostly limited to bpftrace and I didn't see an analogue to its BEGIN block that I could use):

// off-by-one.bpf.c

#include "vmlinux.h"
#include <bpf/bpf_tracing.h>
#include "maps.bpf.h"

#define MAX_BUCKET 3

char LICENSE[] SEC("license") = "BSD";

struct key_t {
    u64 bucket;
};

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, MAX_BUCKET + 2);
    __type(key, struct key_t);
    __type(value, u64);
} poc_values SEC(".maps");

int have_triggered = 0;

SEC("tp/sched/sched_process_exec")
int arbitrary_tracepoint_whatever()
{
    if(have_triggered) {
        return 0;
    }

    have_triggered = 1;

    for(int i = 1; i <= 8; i++) {
        struct key_t key;

        increment_exp2_histogram(&poc_values, key, i, MAX_BUCKET);
    }

    return 0;
}
# off-by-one.yaml

metrics:
  histograms:
  - name: poc_values
    help: Proof-of-concept
    bucket_type: exp2
    bucket_min: 0
    bucket_max: 3
    labels:
    - name: bucket
      size: 8
      decoders:
      - name: uint

...and the output of doing a curl -s http://localhost:9435/metrics | grep ebpf_exporter_poc_values_bucket against sudo ./ebpf_exporter --config.dir=examples --config.names=off_by_one:

ebpf_exporter_poc_values_bucket{le="1"} 1
ebpf_exporter_poc_values_bucket{le="2"} 3
ebpf_exporter_poc_values_bucket{le="4"} 7
ebpf_exporter_poc_values_bucket{le="8"} 8
ebpf_exporter_poc_values_bucket{le="+Inf"} 8

By contrast, here's a Go program that creates a Prometheus histogram and observes values 1-8:

package main

import (
        "os"

        "github.com/prometheus/client_golang/prometheus"
        "github.com/prometheus/common/expfmt"
)

func main() {
        h := prometheus.NewHistogram(prometheus.HistogramOpts{
                Name:    "poc_metric",
                Buckets: []float64{1, 2, 4, 8},
        })

        for i := 1; i <= 8; i++ {
                h.Observe(float64(i))
        }

        reg := prometheus.NewRegistry()
        reg.Register(h)

        families, err := reg.Gather()
        if err != nil {
                panic(err)
        }

        for _, family := range families {
                expfmt.MetricFamilyToText(os.Stdout, family)
        }
}

...and the resulting output:

# HELP poc_metric 
# TYPE poc_metric histogram
poc_metric_bucket{le="1"} 1
poc_metric_bucket{le="2"} 2
poc_metric_bucket{le="4"} 4
poc_metric_bucket{le="8"} 8
poc_metric_bucket{le="+Inf"} 8
poc_metric_sum 36
poc_metric_count 8

Suspected Cause

I think what's happening here is that ebpf_exporter is using log2l to determine the bucket index, and since it's performing integer arithmetic, log2l is discarding the fractional part of the result toward zero, so log(3) / log(2) = 1.5849625007211563 ends up as 1, but it should be 2.

bobrik commented 2 weeks ago

Thank you for the thorough analysis. We take log2l from bcc-tools:

I think it would make sense to fix the issue there and then backport it, unless you think it's somehow specific to ebpf_exporter.

hoel-zr-o commented 3 days ago

Well, that's the thing - I don't know I'd say log2l itself is buggy, per se. It's operating on integers, so truncating towards zero might not be the wrong behavior there - eg. 3 / 2 == 1 isn't surprising, so I don't believe if log2l(3) == 1 is. I might be splitting hairs, though, since buggy or not, the BCC project also uses log2l this way, so they would also need to adjust log2l or its callers.

I personally think the right move would be adjust the two log2l callsites in maps.bpf.h to add 1 to key.bucket if log2(increment) would normally have a fractional component (which increment_exp2zero_histogram seems to partially do), but I defer to your judgement here!