cameron314 / concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Other
9.53k stars 1.66k forks source link

RFE: add capacity() method to query ConcurrentQueue and BlockingConcurrentQueue about actual capacity #332

Open f18m opened 1 year ago

f18m commented 1 year ago

Hi, I'm using ConcurrentQueue and BlockingConcurrentQueue with preallocated queue size and, as described at https://github.com/cameron314/concurrentqueue#preallocation-correctly-using-try_enqueue, I'm using the constructor overload that accepts the minimum number of elements (N) and the maximum number of explicit and implicit producers. In such case the actual capacity of the queue is computed internally. I think it would be useful to have a capacity() method to verify what was the internal computation result, mostly for debug purposes.

My use case: I'm building a unit test in debug mode (also with gcov code instrumentation, so the code runs much slower than the optimized version!). Such unit test is constructing a BlockingConcurrentQueue using the constructor overload that accepts the minimum number of elements (N) and the maximum number of explicit producer and I request a minimal size of 16384 items. The try_enqueue() method fails after enqueing about 4000-4400 items (the exact number varies from run to run and is not deterministic). I would expect to be very far away from internal queue limits. Moreover in my unit test I have only 1 producer (with IMPLICIT token) and 1 consumer (with EXPLICIT token). Having a capacity() method might help me to get to to the root of the problem perhaps.

Thanks!

cameron314 commented 1 year ago

You're likely hitting a limit from the traits, not element capacity. See https://github.com/cameron314/concurrentqueue/issues/169#issuecomment-546113196. This is also described in the README.

There's no single capacity count in any case. It depends on which producer will be used and will change dramatically depending on what operations are called on the queue in the future.

f18m commented 1 year ago

ok thanks @cameron314 so if I understand correctly this comment https://github.com/cameron314/concurrentqueue/issues/169#issuecomment-546113196 then by default the max elements I can enqueue from the same (implicit) producer using just try_enqueue() (CannotAlloc policy) is IMPLICIT_INITIAL_INDEX_SIZE*BLOCK_SIZE items which means 1024 items.

I'll try to increase using traits the IMPLICIT_INITIAL_INDEX_SIZE but just like in ticket https://github.com/cameron314/concurrentqueue/issues/169 my problem is that the queue size should be configurable at runtime. And I would like to avoid memory allocations. I guess this is not possible out of the box.

In practice maybe I can configure traits to hold the MAX possible queue length (so I tell the user that it's allowed to configure queue size in range [A;B]). Let's say it configures the qeuue at size C with A<C<B. Then before invoking try_enqueue() I might use size_approx() to check when the queue is bigger than C and fail the enqueue... however invoking size_approx() very frequently does not sound very performing...

f18m commented 1 year ago

@cameron314 just an idea: do you think it would be possible to add an upper limit to the number of blocks that can be dynamically allocated by an enqueue() operation? I think most of the use cases "I want to preallocate the queue; in other words I want to use only try_enqueue()" stem from the fact that the user does not want to loose control over the queue size. If it was possible to set a (rough) upper limit in terms of blocks that the queue can allocate dynamically, I think that would solve such use cases...

cameron314 commented 1 year ago

You can use the MAX_SUBQUEUE_SIZE trait to indirectly control this, if that helps.

f18m commented 1 year ago

The problem is that also MAX_SUBQUEUE_SIZE is a trait so it's a compile-time setting. I come up with another solution to achieve bounded-queue with size limit set at runtime. Can you check if below logic makes sense to you?

* the ConcurrentQueue/BlockingConcurrentQueue is compiled with default traits
* to enqueue, an hybrid approach is taken:
1) fast path: try_enqueue() is used first; if it returns true, then stop (packet enqueued successfully). if it fails go to step2
2) check aux flag "max_size_reached": if that is true, stop (packet CANNOT be enqueued). if that is false go to step3
3) use size_approx() to query actual queue size: if the size is equal or larger than runtime-decided MAX, then set max_size_reached=true and stop (packet CANNOT be enqueued). If size is smaller, go to step4
4) use enqueue() method

The idea is to avoid invoking size_approx() unnecessarily since we know that, once the queue grows, it never shrinks again (so the max_size_reached flag starts with value FALSE and can turn to TRUE only once and will never go back to FALSE). In particular, once the queue reaches, at any point in time a size_approx()>=MAX, then we stop invoking size_approx() in the future and we will start using ONLY try_enqueue().

Of course since the steps 1-4 are not atomic, there might be some inaccuracy, but I think it should achieve the final target (bounded-queue with size limit set at runtime).

What do you think?

cameron314 commented 1 year ago

This indeed would not be very accurate. In particular, size_approx can temporarily return a count larger than the actual number of queued elements.

There's also a race where multiple threads see the queue is not full, yet there's only one slot remaining until MAX, and they all subsequently enqueue at once, surpassing MAX.

Consider instead using an approach based a LightweightSemaphore to limit the intake of elements, similar to the BlockingConcurrentQueue but in reverse.

f18m commented 1 year ago

This indeed would not be very accurate. In particular, size_approx can temporarily return a count larger than the actual number of queued elements.

But this kind of "inaccuracy" in size_approx() (which I was expecting given the actual API name :) ) I would expect it to be in the range of a BLOCK_SIZE or perhaps be even up to P*BLOCK_SIZE where P=num of producers. Is that a reasonable assumption?

I think that in most use cases the actual need is to have a bound somewhere and there is no need of very high accuracy (for the reason that a lockless queue is used in high-performance scenarios with millions of items enqueued every sec -- and in those cases very rarely you know with high accuracy what must be the max queue size). So even if the error is say in the [1-P]*BLOCK_SIZE range, at least for me, it's absolutely acceptable.

Do you think it would make sense to add some reference to the documentation to such "bounded queue" approach? I found it very difficult/tricky to use the "preallocated queue / try_enqueue-only" approach, despite existing documentation.

Moreover the lack of a capacity() method (or perhaps the lack of min_capacity/max_capacity methods) did not help... :)

cameron314 commented 1 year ago

I believe the inaccuracy would be driven primarily by the number of threads accessing the queue, plus the latency between the values seen by one thread versus the "current" value, but not related to the block size, per se. The internal indices are only eventually consistent, with no specific bound.

I think using a semaphore-based approach would be better, without imposing much overhead.

f18m commented 1 year ago

ok I understand that, it's just that it might be tricky to do for somebody not experienced in this area... for now the approach I described here https://github.com/cameron314/concurrentqueue/issues/332#issuecomment-1446032554 is working fine (in a production env) so I'm sticking with that...

What about the original topic of this ticket: i.e. adding either a capacity_approx() or a capacity_min()/max() methods to know how large it has grown so far?

cameron314 commented 1 year ago

Sorry, not going to add such methods as I believe they'd be more misleading than helpful. The error bounds would be quite large.