[Bug]: Backlog Does Not Track Consumer Lag for DataFlow Python Kafka Pipeline

pof-declaneaston commented 1 year ago

What happened?

Hello,

I have built a DataFlow Python pipeline which I am using to consume from and produce to Kafka. When I run my pipeline I notice that the backlog metric in the DataFlow job UI stays at or near 0 seemingly forever. This is incorrect since I can see in my Kafka metrics that the consumer group's latency is constantly increasing, the DataFlow job is not keeping up with the consumer. Since the backlog is staying low the job never scales up and never catches up with the topic. I believe Java pipelines using KafkaIO correctly take consumer latency into account to track backlog.

My pipeline is running v2.44.0. I would provide something to repro but I don't know how you would access a Kafka cluster to repro with.

Thanks a lot, Declan

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

[X] Component: Python SDK
[ ] Component: Java SDK
[ ] Component: Go SDK
[ ] Component: Typescript SDK
[X] Component: IO connector
[ ] Component: Beam examples
[ ] Component: Beam playground
[ ] Component: Beam katas
[ ] Component: Website
[ ] Component: Spark Runner
[ ] Component: Flink Runner
[ ] Component: Samza Runner
[ ] Component: Twister2 Runner
[ ] Component: Hazelcast Jet Runner
[X] Component: Google Cloud Dataflow Runner

pof-declaneaston commented 1 year ago

I tried running the pipeline with num_workers = 3, which worked well for a little while but eventually the runner scaled it back down to 1 worker and the system is unable to keepup.

pof-declaneaston commented 1 year ago

I realized that the autoscaler can actually be disabled so I have a work around for the issue but definitely would prefer a cleaner solution.

apache / beam