Closed scwhittle closed 1 month ago
Possible work-arounds:
use_unbounded_sdf_wrapper
KafkaIO.readFromDescriptors
instead of KafkaIO.read
/KafkaIO.readBytes
Upon further investigation, the incorrect options are set on the ReadSourceDescriptors
transform and not used for determinining redistribute and allowed duplicates of the read elements as that uses the original kafkaRead
here
The effect of the misconfiguration is if commitoffets
is enabled it is not performed due to the logic here
Fixed in 2.58.1
What happened?
Introduced in https://github.com/apache/beam/commit/9cbdda1b4e52452728cc9da2fa8498d0ace5ed7b so this is affecting the 2.58 beam release. The Dataflow v2 Runner uses this version for Kafka by default.
Since this can introduce duplicates and is unexpected, marking as P1 and considering to release a patched sdk version to address it.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components