Closed martin-sucha closed 2 years ago
I'm curious of your use case and what bring you to optimizing this part of the client.
We switched away from datadog-go several years ago to another statsd library because datadog-go did not have aggregation support at the time and was spending too much time sending packets. Now datadog-go has aggregation support, so I was checking if we can switch back to datadog-go as the other library does not support distribution metrics, which I'd like to use in some places. As part of that experiment, I profiled both versions.
As you can see in the image from the profiler in the original post, datadog-go was about 2.2% CPU time in the staging environment and getContext/getContextAndTags was majority of that. At the same time it was obvious from the flamegraph that the function can be optimized pretty easily. Now the profiler shows about 1.6% CPU for datadog-go.
I also tried Prometheus Go client (with counter vectors only) for comparison and that is around 1.4% CPU, so much closer now.
Would you mind sharing how many points per second you're sending, what type of metrics ...
In the staging environment where I tested this about 42k metrics per second in one pod before aggregation (as shown by datadog.dogstatsd.client.metrics_by_type
metric):
Metric type | Rate |
---|---|
counter | 34k / second |
timing | 7k / second |
histogram | 800 / second |
gauge | 30 / second |
set | 0 |
distribution | 0 |
It is not necessary to do multiple allocations and copying, single pass is enough.