cameron314 / concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Other
9.87k stars 1.69k forks source link

Very big cpu usage after processing tens of millions of enqueue/dequeue events #271

Closed silviucpp closed 2 years ago

silviucpp commented 2 years ago

Hello,

I'm using evpp which has a event loop where the queue can be configured to use one of:

Your library performs much better but in time the performance goes very worst. The cpu usage is linearly increasing over time and we had to restart our server to fix the problem.

After we investigated the problem using perf-top we saw that most of the cpu is coming from the following line:

https://github.com/Qihoo360/evpp/blob/8984ca6b4fa410985ecc4a659fa0c9168dd95489/evpp/event_loop.cc#L316

We compiled evpp using boost and std::vector and we don't have this problem any longer. The evpp by default is using an old version of your library but i also tried with master as well and it's the same.

Silviu

silviucpp commented 2 years ago
Screenshot 2021-11-24 at 15 09 47
silviucpp commented 2 years ago

I removed from DoPendingFunctors all DLOG_TRACE because are calling GetPendingQueueSize which is calling the size_approx from the container and it's the same. so the only call now in that function to your library is to try_dequeue.

Honestly now it's a bit better but still gets degraded over time.

cameron314 commented 2 years ago

Are you churning through short-lived threads (without using explicit tokens) by any chance?

The queue implementation is actually a collection of sub-queues, one for each producer. If you enqueue without an explicit producer token, a thread-local sub-queue is implicitly created, which is never destroyed. try_dequeue must walk through all these empty zombie sub-queues looking for an element, which can get very expensive in the worst case.

If you have a recent compiler on a major platform (e.g. x64 Linux) you can try uncommenting this #define; if it fixes the problem, then the above hunch is probably correct: https://github.com/cameron314/concurrentqueue/blob/550a9aa97cc64ecf235f13bda0b2c09b3515e47b/concurrentqueue.h#L221

silviucpp commented 2 years ago

Let me test this ! On the consumer side its only one thread (loop one) and the producer side yes there are short living threads as you pointed out.

Also I forgot to mention that I'm using jemalloc (not sure if matters or not). I will test and come back to you

cameron314 commented 2 years ago

With short-lived threads, I suggest using producer tokens when enqueueing, which would sidestep this problem entirely (the sub-queues created for each producer token are recycled after the token is destroyed).

silviucpp commented 2 years ago

So basically into the app I sent you :

https://github.com/Qihoo360/evpp/blob/8984ca6b4fa410985ecc4a659fa0c9168dd95489/evpp/event_loop.cc#L236

I have to change:

while (!pending_functors_->enqueue(cb)) {}

with:

moodycamel::ProducerToken ptok(pending_functors_);

while (!pending_functors_->enqueue(ptok, cb)) {}

Can you please advise ? EventLoop::QueueInLoop is the method that produce the messages and can be called from short living threads. At least in my case this is the case

cameron314 commented 2 years ago

Creating/destroying a producer token is significantly more expensive than enqueuing a single item. The idea is to reuse the token for multiple operations. Consider declaring the token higher up in the callstack, making the token a static thread-local object, or using a pool of producer tokens safely shared between threads.

silviucpp commented 2 years ago

Ok thanks a lot for your explanation ! I think it will be useful to add a note on the project readme related to short living threads.

enable-ing MOODYCAMEL_CPP11_THREAD_LOCAL_SUPPORTED it's safe or still not clear ? I see others are complaining it's causing crashes into the issue tracker.

cameron314 commented 2 years ago

I'll fix the README which still assumed this was uncommented. Sorry about that!

If it works on your system, it should be safe, but since it's off by default it's not particularly well tested; just keep an eye on it.

silviucpp commented 2 years ago

Thanks ! I think you can close this bug. now it's clear from where the performance hit was coming.