Open danmarcab opened 7 months ago
If you recently changed the ack option, I would first investigate if it is related to that. :)
If you recently changed the ack option, I would first investigate if it is related to that. :)
Sounds like a sensible place to start, thanks
A bit more context, we have 2 pipelines with the exact same configuration, one of them is affected the other one isn't. Of course this could be a coincidence 🤷
@danmarcab is it something that you can reproduce in a small application?
We are running version 0.4.1
in our prod without any problem, our setting is almost similar:
children = [
MyApp.ConsumerBroadway,
MyApp.ConsumerBroadwayV2,
MyApp.ConsumerBroadwayV3,
MyApp.Telemetry
# Starts a worker by calling: MyApp.Worker.start_link(arg)
# {MyApp.Worker, arg}
]
I'm asking because I'd like to see the issue in a small project and avoid updating until we can find the root cause.
@danmarcab is it something that you can reproduce in a small application?
We are running version
0.4.1
in our prod without any problem, our setting is almost similar:
- offset_commit_on_ack: false
- Elixir 1.15.7 (compiled with Erlang/OTP 26)
- We are running more than one Broadway pipeline, what I mean is we have application setting like this:
children = [ MyApp.ConsumerBroadway, MyApp.ConsumerBroadwayV2, MyApp.ConsumerBroadwayV3, MyApp.Telemetry # Starts a worker by calling: MyApp.Worker.start_link(arg) # {MyApp.Worker, arg} ]
I'm asking because I'd like to see the issue in a small project and avoid updating until we can find the root cause.
No we haven't. This only happens occasionally after a few hours in our main app, so it's very hard to reproduce. I'll keep the issue updated when we find anything.
Another data point. We are not using batchers. Mentioning it because the fix from https://github.com/dashbitco/broadway/commit/602931c981d194c6e07e699ab45b6e3747cf44a0 was for the batcher stage.
We don't cancel timers in Broadway Kafka, so I can't see it being an issue. We do use timers in the Broadway's rate limiter and it could have the same bug, but there it would raise (and not accumulate messages forever). So it may not be related to this.
Good morning/afternoon/evening all 👋
We are experiencing an issue which shows itself as pretty much the same as https://github.com/dashbitco/broadway_kafka/issues/100 (cannot reopen that one)
We can see some messages stuck in the ack state:
The processor itself keeps processing, it just never acks the messages.
That original issue was fixed by https://github.com/dashbitco/broadway/commit/602931c981d194c6e07e699ab45b6e3747cf44a0. Which was an obscure timer issue. I wonder if there is another edge case we are missing.
We are running the latest versions of broadway and broadway_kafka.
A couple of things that are different in our code/env:
offset_commit_on_ack
is set to false, (was true previously)Any ideas of where to look?