Closed pedro-stanaka closed 2 months ago
Sadly, I think the dogstatsd-ruby implementation is achieving such high rates by way of dropping things on the floor. This monkey patch shows how we can grab the internal dogstatsd-ruby telemetry and print it out while a benchmark is running... and then if I do some local testing I can see some intriguing results:
zsh ❯ SERIES_COUNT=100 ./local-udp-throughput
===== Datadog Client - multi-thread throughput (5 threads) =====
bytes_sent: 9817
bytes_dropped: 70141833
bytes_dropped_queue: 70141833
bytes_dropped_writer: 0
packets_sent: 10
packets_dropped: 3940561
packets_dropped_queue: 3940561
packets_dropped_writer: 0
----------
bytes_sent: 9817
bytes_dropped: 2716252
bytes_dropped_queue: 4113941
bytes_dropped_writer: 0
packets_sent: 4
packets_dropped: 466903
packets_dropped_queue: 546131
packets_dropped_writer: 0
----------
bytes_sent: 11228
bytes_dropped: 5636044
bytes_dropped_queue: 8471648
bytes_dropped_writer: 0
packets_sent: 5
packets_dropped: 951310
packets_dropped_queue: 1109485
packets_dropped_writer: 0
----------
I'm not immediately sure what to make of it, because those numbers seem extraordinarily high. Our benchmark might actually be causing some really suboptimal behavior in the dogstatsd-ruby client, perhaps.
by way of dropping things on the floor
Which to be honest is a reasonable strategy to protect the system. In general people aren't in the business to emit statsd metrics, so if emitting these metrics is consuming more resource than reasonable and impacting the actual business, dropping them on the floor and alerting make sense.
@casperisfine would you mind taking another look? Would like to benchmark my new branch with this.
Well, for now I am going forward with this version. Both iterations and series are controlled via env vars, so the user/dev can pick and choose the best combination.
Summary
Our benchmarks currently focus solely in serialization, we would benefit from having some tests focused also on scenarios where we have more cardinality. In this PR I am introducing a SERIES_COUNT parameter to inject different label combinations on the metrics we produce and also I am including the time to close/shutdown the client on the processing time, since we might clear buffers and finish actually pushing metrics.
Apart from that, I am also comparing our implementation against the official one from Datadog.
The new benchmark results on my machine (Apple M3 Pro (11) @ 4.06 GHz) are: