Closed thomas-oliver-cko closed 2 years ago
Hello,
the dogstatsd package is taking a surprising amount of CPU time
Do you have some numbers to share (CPU usage and number of metrics sent)? How did you measure it? Did you measure the CPU usage, the time spent in the functions or the waiting time? Are you working in a very contraint environment with very low memory, very low cpu?
This wait time is wasted CPU time, and for high performance applications is a waste of resources
Having a single thread that blocks has a very low resource impact and is usually not a problem. One of the reason to use async
is to avoid blocking 1000 threads when performing 1000 HTTP requests. Blocking 1000 threads has a resource impact.
I did not use async
for latency reason. Consider the following scenario where async
is used for Wait
:
Wait
. It releases the thread to the thread poolasync
Wait
still wait for an available thread.
=> The whole DogStatsD pipeline is blocked. The metrics are not sent and the new metrics start to be dropped.Note: When the number of DogStatsd methods called is low, the CPU usage should be closed to 0 (But the waiting time should be very high).
This particular test was run on an Quad core Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz 2.59 GHz
with 16GB RAM. Our tool was handling 100 incoming HTTP Requests Per Second (rps), and performing 300 outgoing HTTP rps. We profiled it using JetBrains dotTrace. In the test, our application takes an average of 23.6% of the CPU. Of the CPU used, 4.83% of CPU time is spent waiting in the dogstatsd client
Are you working in a very contraint environment with very low memory, very low cpu?
We have the option of scaling horizontally, and our applications have enough CPU and Memory allocated to them to run as they are. However, when around 5% CPU time under moderate load is spent waiting, it does make us pause and think.
The application is not sending any metrics and the background thread run Wait. It releases the thread to the thread pool The application starts using a lot of threads and keep them for long. As no thread is available, the async Wait still wait for an available thread. => The whole DogStatsD pipeline is blocked. The metrics are not sent and the new metrics start to be dropped.
In regards to thread starvation, if our application is suddenly under heavy load, we want all resources dedicated to serving customers. That is one use for the concurrent queue, so that if the application is under load, metrics can be buffered into memory for a period of time so they can be sent later. If thread starvation continues and we haven't yet scaled out, I am much happier dropping metrics then I am letting down customers.
Using async await
or some sort of eventing trigger will only cause metrics to be dropped under high load
My two cents:
Using a dedicated thread for sending metrics has pros and cons. Here, this is done to ensure that metrics are still pushed even if the threadpool is overloaded. Whether it's a good or bad thing is a debate of its own.
The CPU load is increased specifically because of the Task.Wait
function, which internally uses synchronization and a non-negligible amount of spinning. Switching to Thread.Sleep
(https://github.com/DataDog/dogstatsd-csharp-client/pull/159) should solve this point in particular.
Thanks @kevingosse. Thread.Sleep
does block a thread, but it does yield CPU to other threads for the duration of the sleep which will certainly fix this problem, see here. I'm happy for this issue to be resolved by #159. We'll run another profile once this change is out and confirm it does what it says on the tin.
@thomas-oliver-cko : I will also try to reproduce the behavior and try to check if #159 solve the issue. What .NET runtime are you using?
@ogaca-dd we are running netstandard 2.1
@thomas-oliver-cko DogStatsD 7.0.0 is out. It includes #159. Do you still have the issue with the latest version?
Closed as there is no recent activity on this issue. Feel free to reopen an issue if you still have the problem.
We have recently profiled the performance of our application and we can see that the dogstatsd package is taking a surprising amount of CPU time. When we looked deeper, we found that the dogstatsd is reserving an entire thread to poll a
ConcurrentQueue
for metrics. This tread polls the queue, and then waits. This wait time is wasted CPU time, and for high performance applications is a waste of resources. The main culprit is:but
Task.Wait()
is called in multiple places within the package. By moving toasync await
, or by using some sort of event based timer, you release that thread back to the thread pool so it can be used by the main application. Can we get some thoughts on if/how this can be solved? We are very interested in seeing this blocking wait removed