cameron314 / concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Other
9.86k stars 1.69k forks source link

Please help optimising single producer (across multiple threads) - single consumer (single thread) #385

Closed varkey98 closed 4 months ago

varkey98 commented 4 months ago

Thanks a lot for your hardwork, have a couple of questions I couldn't after going through the docs. I understand that typically, the most fastest way is using the queue with Producer-Consumer tokens and bulk function variants.

Let me try explaining my problem statement. My usage is of single producer - single consumer. But events can be added to the queue from different threads (total ~250 events per second), but consumed by a single thread itself.

  1. In this case would using the same producer token across multiple threads be thread safe, my understanding is that its not? Also would using the non-token version be better, in case its not thread safe as the threads have to wait for locks?
  2. How can I limit the size of the queue in this case? What I tried was increasing the BLOCK_SIZE to 4096 and then initialising the queue with capacity also 4096 so as to reflect the single consumer case. Not sure if its the correct way to do it and if not what would be the best configuration in the above case?
cameron314 commented 4 months ago

What do you mean by single producer when there's multiple threads enqueueing? Is there some sort of synchronization between them?

250 operations per second is not many at all, I'd be surprised if you see much difference regardless of data structure.

varkey98 commented 4 months ago

Thanks for the quick reply. What I meant was that all of them were getting pushed by the same class instance, which I realised would anyways be the case. Apologies, just realised that each thread will be a producer, I was confusing it with the Kafka queues 😐. Also the number of events per second can go upto 1000 and in extremely rare cases around 4000.

cameron314 commented 4 months ago

In that case, it's multiple producers indeed. Ideally each would have a producer token (possibly thread-local).

Keep in mind that ordering is not maintained across producers, only between elements queued by a single producer.

The queue is not designed to be a fixed size. However, you can impose artificial limits with the MAX_SUBQUEUE_SIZE trait value.

varkey98 commented 4 months ago

In my case maintaining a producer token per thread would be difficult. It's part of a web application and each request will push an event into the queue. Ordering is not an issue.

Would having a consumer token help. Can you give me a brief idea on how these tokens help in optimisation? I thought it's about where to search in memory, but queue is always FIFO right?

cameron314 commented 4 months ago

The queue is implemented as a set of sub-queues, one per producer. The producer token allows 'enqueue' to directly add a sub-queue (a pointer to it is stored in the token). The alternative is for each 'enqueue' operation to look up an implicit sub-queue based on the current thread ID.

Yes, using a consumer token will tend to improve performance too, for other reasons.

varkey98 commented 4 months ago

Thanks, I have one last question, I looked up MAX_SUBQUEUE_SIZE trait, how does it differ from BLOCK_SIZE? Also its written that MAX_SUBQUEUE_SIZE is rounded upto the BLOCK_SIZE. Also, in my case each thread would be a new producer right? Assuming the client doesn't have thread pooling, will setting the above limits help as each thread will create its own sub-queue? Or does it need INITIAL_IMPLICIT_PRODUCER_HASH_SIZE to be used in conjunction?

cameron314 commented 4 months ago

MAX_SUBQUEUE_SIZE limits the size each sub-queue can grow to. Sub-queues are formed of lists of blocks of element slots. BLOCK_SIZE controls how large each internal blocks is.

Generally each thread is mapped to one sub-queue. While you can configure INITIAL_IMPLICIT_PRODUCER_HASH_SIZE, the implicit producer hash will grow as needed with calls to enqueue. However, I do recommend using producer tokens (explicit producers versus implicit producers).

If there's no thread pooling, does that imply short-lived threads? If so, that's quite inefficient.

varkey98 commented 4 months ago

Thanks. Yes, the threads might be shortlived. This is a library used across different applications and I dont have control over how those applications are written. If producer tokens are needed then it probably needs a lock as well

varkey98 commented 4 months ago

Thanks again for your time :)

varkey98 commented 4 months ago

@cameron314, just realised this

While you can configure INITIAL_IMPLICIT_PRODUCER_HASH_SIZE, the implicit producer hash will grow as needed with calls to enqueue.

Does this mean, even if we use try_enqueue, the producer hash map will keep on allocating new subqueue per thread?

cameron314 commented 4 months ago

If producer tokens are needed then it probably needs a lock as well

Use TLS

Does this mean, even if we use try_enqueue, the producer hash map will keep on allocating new subqueue per thread?

No, try_enqueue doesn't allocate.

varkey98 commented 4 months ago

TLS as in? Sorry, only thing I can think of is HTTPS TLS, but not sure how that's related to this

cameron314 commented 4 months ago

Thread local storage

varkey98 commented 4 months ago

Oh okay, thanks. But in case of short lived threads, this won't help right?

cameron314 commented 4 months ago

It will still help. The token will be associated and disassociated with a sub-queue when it's initialized and destroyed.

varkey98 commented 4 months ago

Oh okay, understood. Thanks a lot for the prompt replies