Open leorg99 opened 11 months ago
Hi @LeorGreenberger
I'm sorry for not getting back to you sooner.
The Kafka consumer protocol only allows for sequential reads (you can basically only pull the next message in a loop), all the parallelization is handled by Silverback, which creates a Channel<T>
per each partition and pushes the messages accordingly.
That being said, the main benefit of parallelization is that (in the best case) you can process concurrently 1 message per partition. Each channel is being sequentially read in a separate "thread" and a Task
is being scheduled to process the message.
The idea here is that your subscriber will be slower than the consumer and therefore being able to process messages in parallel leads to a far better performance (e.g. if you need to write the consumed data to the database or similar).
The main difference when processing partitions together is that all messages will be written to the same Channel<T>
regardless of the source partition and you can therefore always only process them sequentially.
A thing has to be noted: as far as I know, there isn't a way to predict or control which partition are you gonna get the next message from. This means that you could be subscribed to 4 partitions, but get N consecutive messages from the same partition and there isn't much I can do about it.
In Silverback there is a setting called BackpressureLimit
which basically defines the capacity of those channels. To make an example, if the backpressure limit is set to 2 and you receive 10 consecutive messages from the same partition, there isn't much I can do to parallelize.
I could mitigate this issue by pausing the partitions for which the channel is full but I didn't give this a try so far.
Thank you very much for the informative response! Interestingly enough, my subscribers generally enqueue a job in Hangfire, which allows me to at least monitor what is happening through the Hangfire Dashboard and requeue failures as needed.
Edit: Just thought of something else. When configuring BatchProcessing
, does ProcessPartitionsIndependently
create a batch/queue per partition while ProcessAllPartitionsTogether
funnels everything into one batch/queue?
Hi Sergio,
I was hoping you could clarify the difference between these two modes. Read the documentation a few times but it still is not clear to me.
Channel<T>
is created between the consumer and the in memory bus? My assumption is that the in memory bus uses oneChannel<T>
to sequentially produce messages to a subscriber, so what would be the benefit of having a Channel per partition connected to the in memory bus?Thanks!