apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.89k stars 4.27k forks source link

[Bug]: KafkaIO metrics write per-partition values to per-split metrics #33096

Closed sjvanrossum closed 1 week ago

sjvanrossum commented 1 week ago

What happened?

31137 overwrites the per-split metric backlog_bytes.${SPLIT} with a per-partition value rather than the accumulated value for the split. #31281 introduces a Map to store metrics for all past and current splits (1 partition) of the ReadFromKafkaDoFn instance and may repeatedly overwrite non-current splits with stale values. The map used to store these values is not thread-safe and may trigger a ConcurrentModificationException since GetSize and other SDF methods may concurrently attempt to read and write the map. Finally, the per-split caches kept by the instance are keyed on TopicPartition, which is not unique among all splits since the split may override the bootstrap server.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

sjvanrossum commented 1 week ago

Fix provided in #32921.

damccorm commented 1 week ago

@sjvanrossum it looks like automation added this to the 2.61.0 release, is it actually a release blocker?

Abacn commented 1 week ago

close after #32921 merged