PromPHP / prometheus_client_php

Prometheus instrumentation library for PHP applications
https://prometheus.io/docs/concepts/metric_types/
Apache License 2.0
415 stars 91 forks source link

Possible Race Condition in APC::updateHistogram #140

Open skyellin opened 8 months ago

skyellin commented 8 months ago

We have noticed that our application occasionally times out. When this happens, logs nearly always indicate that the script stopped execution at "promphp/prometheus_client_php/src/Prometheus/Storage/APC.php on line 72" (sometimes line 71).

This is the following block inside \Prometheus\Storage\APC::updateHistogram

// Atomically increment the sum
// Taken from https://github.com/prometheus/client_golang/blob/66058aac3a83021948e5fb12f1f408ff556b9037/prometheus/value.go#L91
$done = false;
while (!$done) {
    $old = apcu_fetch($sumKey);
    if ($old !== false) {
        $done = apcu_cas($sumKey, $old, $this->toBinaryRepresentationAsInteger($this->fromBinaryRepresentationAsInteger($old) + $data['value']));
    }
}

My very superficial understanding leads me to believe that apcu_cas may be unlikely to return true if another process (or many processes) are also continuously updating the value stored at $sumKey, since $old may no longer match the persisted value.

Do you see any cause for concern or is my analysis incorrect?