Closed alex-lt-kong closed 11 months ago
this_thread::sleep_for(chrono::microseconds(10))
will sleep for significantly longer than 10 us.
Oh really? The purpose is that I want the program to be "sort-of" busy-waiting but not using up all CPU resources. If this_thread::sleep_for(chrono::microseconds(10))
doesnt work on Windows, how should I achieve something similar?
Spin wait.
Note also that try_dequeue
will not treat producers equally, so the min/max/avg spread will be very different. All the elements from the first producer will be dequeued first, then all from the second, then all from the third.
Finally, don't use system_clock
for timing, use steady_clock
or high_resolution_clock
.
By Spin wait you mean I just keep looping like this?:
while (count < iter)
{
if (queue.try_dequeue(then))
{
auto now = chrono::system_clock::now();
int64_t epoch_us = chrono::duration_cast<chrono::microseconds>(now.time_since_epoch()).count();
sum += epoch_us - then;
++ count;
}
}
You could also do that, but if you want to wait, then wait, just not with a sleep:
auto deadline = std::chrono::steady_clock::now() + std::chrono::microseconds(10);
while (std::chrono::steady_clock::now() < deadline);
Sleep offers no guarantees on precision.
I don't think it affects the code here, but also bear in mind that thread creation on Windows is at least an order of magnitude slower than on Linux.
Thanks for the information.
One more thing that concerns me: "Note also that try_dequeue
will not treat producers equally", if my threads are all long-running, say they will keep running for days, will this inequality still persist? Is there a way to make producers "equal" without impacting performance? In my use case, strictly equal is not needed, just I want the latency across different producers to be "similar".
Short answer is no, this queue does not guarantee any sort of fairness.
However, if you use a consumer token, it will be somewhat more fair (rotating between producers at fixed intervals).
Let me ask this: usually, a queue is a FIFO data structure, does this still apply to MPMC queue like ConcurrentQueue? When you say try_dequeue
does not treat producers equally:
enqueue()
ing until nothing is waiting to be enqueue()
ed, and then other producers can start enqueue()
ing? Orenqueue()
, it is still possible that items from one particular producer gets dequeue()
ed first?(the 2nd scenario means the FIFO assumption does not hold anymore--or to put it differently, we have a per-producer FIFO, but not an overall FIFO)
or to put it differently, we have a per-producer FIFO, but not an overall FIFO
Exactly -- please see the README. If you need global ordering, this is not the right queue.
thanks for all the help!
I tested the minimally reproducible example on different computers using different compilers (gcc, clang, cl), the results are pretty consistent--Linux binary is faster than Windows binary by 150 times. Is this expected? How can we achieve similar performance on Windows?
The MRE code
Test results
$ clang++ --version Debian clang version 14.0.6 Target: x86_64-pc-linux-gnu Thread model: posix $ clang++ test.cpp -o test-cl.out -O3 $ ./test-cl.out avg. time elapsed (us): 42.5577 All tasks completed.
$ clang++ --version Debian clang version 14.0.6 Target: x86_64-pc-linux-gnu Thread model: posix $ clang++ test.cpp -o test-cl.out -O3 $ ./test-cl.out avg. time elapsed (us): 48.0926 All tasks completed.