Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions.
This adds WithMaxSamplesPerContext(int) which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant.
The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf)
Additionally this fixes the computation of the rate for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.
Here's the result on an application sending ~10 000 samples per second per distribution contexts
Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions.
This adds
WithMaxSamplesPerContext(int)
which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant.The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf)
Additionally this fixes the computation of the
rate
for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.Here's the result on an application sending ~10 000 samples per second per distribution contexts
Agent CPU
dogstatsd Bytes/sec
The impact on the application itself: