apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO #30870

Open xianhualiu opened 6 months ago

xianhualiu commented 6 months ago

What happened?

The default Kafka consumer poll timeout is set to 1 second. It works fine when the the response can get messages from the kafka broker server within this 1 second, such as when client accesses broker within the same region. But if the responding time is more than 1 second, the consumer will not retrieve any messages. One customer reported that throughput of processed messages was extremely low in cross-region read since most of the time the responding time takes more than 1 second.

As a solution, the Kafka consumer polling timeout needs to be configurable, so customer can adjust it according to their needs.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

xianhualiu commented 6 months ago

.take-issue

jbsabbagh commented 6 months ago

This also affects the Python SDK.