avito-tech / bioyino

High performance and high-precision multithreaded StatsD server
The Unlicense
228 stars 22 forks source link

statsd compatible rate aggregation for counter #83

Open Karsonito opened 8 months ago

Karsonito commented 8 months ago

Hello! I seem to have found an inconsistency in the implementation of rate aggregation for counters. Bioyino calculates rate as "counts per sec", while statsd implies "value per sec" Not to be confused with timer's count_ps in statsd (also called rate in bioyino) - it's different aggregation

Can we make it compatible with statsd? https://github.com/statsd/statsd/blob/7c07eec4e7cebbd376d8313b230cea96c6571423/lib/process_metrics.js#L19

How to reproduce:

statsd

{
, flushInterval: 10000
, deleteCounters: true
, graphite: {
    legacyNamespace: false
  , globalPrefix:    ""
  , prefixCounter:   ""
  }
}

bioyino

[carbon]
interval = 10000

[aggregation]
round-timestamp = "down"
update-count-threshold = 0
aggregates.counter = [ "value", "rate", "updates" ]

[naming.default]
destination = "name"
prefix = ""

[naming.counter]
postfixes = { "value" = "count" }

Samples:

for v in {1..100}; do echo "test.counter:$v|c" | ncat -u 127.0.0.1 51125; echo "test.counter:$v|c" | ncat -u 127.0.0.1 52125; done;

Results:

statsd

ncat -klp 51003 | grep -P "^test"
test.counter.rate 505 1704117415
test.counter.count 5050 1704117415

bioyino

cat -klp 52003 | grep -P "^test"
test.counter.count 5050.0 1704117410
test.counter.rate 10.0 1704117410
test.counter.updates 100.0 1704117410

As we see test.counter.rate uses different logic to calculate.

I see that rate was asked ealier in issue https://github.com/avito-tech/bioyino/issues/37 as

rate as sum / (carbon.interval / 1000)

but implemented only

sample_rate as agg.len() / (carbon.interval / 1000)

Later we discussed rate in implementation of sample rate https://github.com/avito-tech/bioyino/issues/53#issuecomment-805937185

Real events count inside app: 300 Sample rate: 0.1 Sent\received count: 30 Interval: 30 Aggregates: sum = 30/0.1 = 300, rate = sum / 30 = 10

My fault for not checking rate for counter