LGouellec / kafka-streams-dotnet

.NET Stream Processing Library for Apache Kafka 🚀
https://lgouellec.github.io/kafka-streams-dotnet/
MIT License
452 stars 73 forks source link

Default replication factor #291

Closed kevin-mcmanus closed 7 months ago

kevin-mcmanus commented 8 months ago

Description

According to the documentation the replication factor for a change log topic is 1.

This looks to be correct for change log topics but differs from the output of a call to .To(), which appears to assume the default value as configured by the broker.

I didn't spot in the documentation an option to change the replication factor for non change log topics.

Should there be configuration options for both change logs and standard streams? My own preference is not to specify the option in code and to fall back to the default of the broker. Is this currently an option for change log topics?

How to reproduce

Not a very useful example but does highlight the question... Using the latest GA NuGet packages.

using Streamiz.Kafka.Net;
using Streamiz.Kafka.Net.SerDes;
using Streamiz.Kafka.Net.Stream;

var config = new StreamConfig<StringSerDes, StringSerDes>();
config.ApplicationId = "test-app";
config.BootstrapServers = "xxx";

StreamBuilder builder = new StreamBuilder();

builder.Stream<string, string>("test-input")
    .GroupByKey()
    .Reduce((x,y) => string.Concat(x,y))
    .ToStream("test-stream")
    .To("test-output");

Topology t = builder.Build();
KafkaStream stream = new KafkaStream(t, config);

Console.CancelKeyPress += (o, e) => {
    stream.Dispose();
};

await stream.StartAsync();
LGouellec commented 8 months ago

Hi @kevin-mcmanus ,

Streamiz just create the internal topics before starting the processing, that's why the replication.factor config is only applied for changelog or repartition topic.

Regarding the .To(..), this is not Streamiz which create the topic but directly the broker when the application try to produce the first message inside a none-existing topic Kafka has a configuration to allow the auto creation of topic on the server if you consume or produce into a topic unknown. https://kafka.apache.org/documentation/#brokerconfigs_auto.create.topics.enable

This is probably your behavior and I don't change anything because a common practice is to create the input and output topics before starting your streaming application. Changelog topic is the responsability of Streamiz, so make sense that Streamiz create this topic.

But in Kafka Streams JAVA, the replication.factor can be set at -1 , -1 meaning use broker default replication factor. https://kafka.apache.org/documentation/#streamsconfigs_replication.factor It could be an improvement.