apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Bug]: Potential performance regression in KafkaIO and schema registry #26262

Open aromanenko-dev opened 1 year ago

aromanenko-dev commented 1 year ago

What needs to happen?

From email thread:

"I am trying to understand the effect of schema registry on our pipeline's performance. In order to do sowe created a very simple pipeline that reads from kafka, runs a simple transformation of adding new field and writes of kafka. the messages are in avro format

I ran this pipeline with 3 different options on same configuration : 1 kafka partition, 1 task manager, 1 slot, 1 parallelism:

KafkaIO.<String, T>read()
        .withBootstrapServers(bootstrapServers)
        .withTopic(topic)
        .withConsumerConfigUpdates(Map.ofEntries(
                Map.entry("schema.registry.url", registryURL),
                Map.entry(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup+ UUID.randomUUID()),
        ))
        .withKeyDeserializer(StringDeserializer.class)
        .withValueDeserializerAndCoder((Class) io.confluent.kafka.serializers.KafkaAvroDeserializer.class, AvroCoder.of(avroClass));

I have made the suggested change and used ConfluentSchemaRegistryDeserializerProvider the results are slightly better.. average of 8000 msg/sec "

We need to investigate and find out the cause of this performance issue.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

sigalite commented 11 months ago

any update on this issue?