Closed fbeltrao closed 5 years ago
I imagine the risk of StoreOffset
is that if something happens on the async thread and the commit doesn't checkpoint you may end up in a world where you replay a few batches of messages you thought were being checkpoint but the checkpoint was never active? In general we are pretty clear you need to be at-least-once expected, so I think I'm ok with the StoreOffset
given the better throughput.
The other alternative is we make it a configurable choice in the host.json
config so they can choose the method they want and we default to one.
ou need to be at-least-once expected, so I think I'm ok with the
StoreOffset
given the better throughput.
that's my feeling too
we make it a configurable choice
that's not a bad idea. how complex would this make the code @fbeltrao ?
The code to support both is not complex. I am questioning the value.
What Jeff says is right, consumers should be implemented to process messages at least once anyway.
I can run a few tests to see if the messages processing repetition is noticeable in our e2e tests if we optimize for throughput.
I'd say let's pick one. Then if customers ask for the other one, or complain about the one we picked, we can then always come back and revisit this.
I am happy with "you're expected to ensure your consumer is idempotent b/c at-least-once semantics" and go for the higher throughput.
Checkpoint saving current is done using Consumer.Commit which blocks the thread. An alternative is to use StoreOffset that will save the checkpoint asynchronously in librdkafka.
Commit is more accurate while StoreOffset offers a better throughput.
Would love your feedback @jeffhollan, @anirudhgarg and @ryancrawcour