apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Bug]: KafkaIO Batch Write Failing due to producer cannot commit messages within timeout #24963

Open Abacn opened 1 year ago

Abacn commented 1 year ago

What happened?

Found by Kafka performance test failing after #24879 in. This is because we removed shuffle=appliance there and then read from shuffle becomes faster. However, the producer is unable to digests data within timeout (can be seen by the flooding warning log of send failed : 'Expiring 148 record(s) for beam-sdf-0:130675 ms has passed since batch creation'). However, the message itself does not contain a timestamp and should be tolerant to throttling.

The error happens because producer.send is asynchronous. It adds a system timestamp when .send gets called and returns immediately. We need a mechanism to prevent overwhelming Kafka producer.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

Abacn commented 1 year ago

Under umbrella issue #22303

aromanenko-dev commented 1 year ago

Kind ping. What is status of this issue?

Abacn commented 1 year ago

@aromanenko-dev unfortunately I do not have a good idea for short term fix. Throttling detection is considered as a long term solution, that is the IO connector has can detect throttling and make runner aware, then runner can prevent scaling up and possibly scaling down.

aromanenko-dev commented 1 year ago

@Abacn Yes, throttling detection is a general problem across all IO connectors and runners. So, should we close this one then?