apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.89k stars 4.27k forks source link

[Bug]: KafkaIO maintains duplicate caches for record size and offset estimators caches #33097

Closed sjvanrossum closed 2 weeks ago

sjvanrossum commented 2 weeks ago

What happened?

Offset consumers are created and cached for every KafkaSourceDescriptor processed per ReadFromKafkaDoFn instance. Caching multiple open Kafka consumers for the same KafkaSourceDescriptor is expensive both on the client and server side and may exhaust connection limits on some Kafka clusters.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

sjvanrossum commented 2 weeks ago

Fix provided in #32928.

damccorm commented 2 weeks ago

@sjvanrossum it looks like automation added this to the 2.61.0 release, is it actually a release blocker?

sjvanrossum commented 2 weeks ago

@damccorm it's not a hard blocker, but since #32928 was approved last week (pending merge of #32921, now merged) I figured I'd flag it on the milestone for your awareness in case you'd like to see it included in 2.61.0.

damccorm commented 2 weeks ago

Closing since https://github.com/apache/beam/pull/32928 is merged