Closed slashmili closed 2 years ago
The root cause is definitely they are pulling from the same partition. I would investigate directly in Brod if that's the expected behaviour and, if so, how to change it.
While I was trying digging into brod
, I started looking into BrodwayKafka
and poke around with the producer process. When I run :sys.get_status
on the producer, I get this huge state https://gist.github.com/slashmili/93b1fd245e65b630bb875ebed8935f10
It seems that producer's buffer is full with messages from different partitions but for some reason is not willing to give it to the processors.
I'm using broadway 1.0.3
and broadway_kafka 0.3.4
Ok, that explains it. GenStage PartitionDispatcher expects an even distribution of events but there are large amounts of events going to a single partition. Can you please try out #90?
Yup! It's putting the other process into work
Never have been happy to see red signs!
Thanks a lot @josevalim!
@slashmili what version of https://github.com/dashbitco/broadway_dashboard and https://hexdocs.pm/phoenix_live_dashboard/Phoenix.LiveDashboard.html
are you using ? because i set mine up to just be
live_dashboard "/dashboard",
metrics: TelemetrySupervisor,
additional_pages: [
broadway: BroadwayDashboard
]
end
but whenever i go to that tab i always get
@amacciola I ran these test in this sample app https://github.com/slashmili/tmp-phoenix-live-dashboard-with-broadway-pluging
In our prod we are using plds and connect the nodes.
But if you can run the sample app and it's working, you might find what's wrong in your app.
@slashmili have you ever had it working on any env outside of running it locally NOT using PLDS ?
@amacciola No, I don't have an embedded live dashboard in non-dev environments.
Hi, I have a case that I set
concurrency
in myprocessors
config, only one of the processors get the job all the time.My setup
The Broadway module is like:
Issue
When I start the iex terminal I see that the connection is established
I've left the terminal open for 30 min and only see that the messages are always delivered to the same
PID<0.582.0>
process. In addition it always pulls from partition 0.I also double checked it using live dashboard always
proc_0
is busy and others are not receiving any tasks.Just in case it helps this my consumer group details on this topic
As you see our consumer is lagged behind the end-offset which is the case we have sometimes in our production, to reproduce that locally, you can do :
Expect to see
I'd expect to see two things:
producer
distributes the messages to allprocessors
.Could you please kindly help me to understand if: a. This is an expected behaviour. b. This is not an expected behaviour and is a bug. c. There is a problem in my Broadway process config.