Closed silviucpp closed 2 years ago
I removed from DoPendingFunctors
all DLOG_TRACE
because are calling GetPendingQueueSize
which is calling the size_approx
from the container and it's the same. so the only call now in that function to your library is to try_dequeue
.
Honestly now it's a bit better but still gets degraded over time.
Are you churning through short-lived threads (without using explicit tokens) by any chance?
The queue implementation is actually a collection of sub-queues, one for each producer. If you enqueue without an explicit producer token, a thread-local sub-queue is implicitly created, which is never destroyed. try_dequeue
must walk through all these empty zombie sub-queues looking for an element, which can get very expensive in the worst case.
If you have a recent compiler on a major platform (e.g. x64 Linux) you can try uncommenting this #define
; if it fixes the problem, then the above hunch is probably correct: https://github.com/cameron314/concurrentqueue/blob/550a9aa97cc64ecf235f13bda0b2c09b3515e47b/concurrentqueue.h#L221
Let me test this ! On the consumer side its only one thread (loop one) and the producer side yes there are short living threads as you pointed out.
Also I forgot to mention that I'm using jemalloc (not sure if matters or not). I will test and come back to you
With short-lived threads, I suggest using producer tokens when enqueueing, which would sidestep this problem entirely (the sub-queues created for each producer token are recycled after the token is destroyed).
So basically into the app I sent you :
I have to change:
while (!pending_functors_->enqueue(cb)) {}
with:
moodycamel::ProducerToken ptok(pending_functors_);
while (!pending_functors_->enqueue(ptok, cb)) {}
Can you please advise ? EventLoop::QueueInLoop
is the method that produce the messages and can be called from short living threads. At least in my case this is the case
Creating/destroying a producer token is significantly more expensive than enqueuing a single item. The idea is to reuse the token for multiple operations. Consider declaring the token higher up in the callstack, making the token a static thread-local object, or using a pool of producer tokens safely shared between threads.
Ok thanks a lot for your explanation ! I think it will be useful to add a note on the project readme related to short living threads.
enable-ing MOODYCAMEL_CPP11_THREAD_LOCAL_SUPPORTED
it's safe or still not clear ? I see others are complaining it's causing crashes into the issue tracker.
I'll fix the README which still assumed this was uncommented. Sorry about that!
If it works on your system, it should be safe, but since it's off by default it's not particularly well tested; just keep an eye on it.
Thanks ! I think you can close this bug. now it's clear from where the performance hit was coming.
Hello,
I'm using evpp which has a event loop where the queue can be configured to use one of:
Your library performs much better but in time the performance goes very worst. The cpu usage is linearly increasing over time and we had to restart our server to fix the problem.
After we investigated the problem using
perf-top
we saw that most of the cpu is coming from the following line:https://github.com/Qihoo360/evpp/blob/8984ca6b4fa410985ecc4a659fa0c9168dd95489/evpp/event_loop.cc#L316
We compiled evpp using boost and std::vector and we don't have this problem any longer. The evpp by default is using an old version of your library but i also tried with master as well and it's the same.
Silviu