cameron314 / concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Other
9.87k stars 1.69k forks source link

wait_dequeue_timed performance concerns #286

Closed teoring closed 2 years ago

teoring commented 2 years ago

My environment is U20, gcc 9.3.0, concurrentqueue master branch, one of the latest (https://github.com/cameron314/concurrentqueue/commit/38d01f450602513722ccf71a209cdce76bb2758b) commits.

I am profiling my application where I have 2 producers and 9 consumers. I am using Vtune profiler. In my scenario I am polling for packets from the interface and if there is a packet in the last poll I use try_deque if there was no packet I am invoking wait_dequeue_timed.

Somehow I see that wait_dequeue_timed introduces a big bottleneck. Is there something I can do to tune this? image image

I see that waitWithPartialSpinning and atomic load in waitWithPartialSpinning is the biggest consumer. Is that expected? This seems to me a bit strange that waiting takes so much CPU.

cameron314 commented 2 years ago

This is normal, since it's a spinning wait (which eventually blocks if no elements are put into the queue while it spins). If you want to reduce the spin count and block more quickly, change the MAX_SEMA_SPINS trait. Setting it too low may impact throughput, however (increased context switches).

teoring commented 2 years ago

There is something that bothers me. I am writing multithreading capture based on libpcap and I see that sensor doesn't process traffic fast enough ( looses packets ) while CPU usage is not reaching 100%.

My consumer thread ( balancer ) is half time chilling and same concerns consumers ( packet processing threads ) but I am still loosing packets.

The first thread with arrow is producer.

image

teoring commented 2 years ago

So VTune and CPU consumption hints that there is no data to process ( high use in spinning wait ).

But on second thought, I think this might be related to libpcap itself, not providing packets fast enough.