glazkovalex / Rebus.Kafka

Apache Kafka transport for Rebus
MIT License
16 stars 6 forks source link

Usage of Confluent 1.5.3 causes topic creation issues #7

Closed mkoziel2000 closed 3 years ago

mkoziel2000 commented 3 years ago

It seems that using Confluent 1.5.3+ causes the auto topic creation capability to fail. This seems to result in stalling the Rebus.Kafka connector to the point where calling Publish() through Rebus will eventually result in a large delay followed by an error condition.

I've tried setting the AutoCreateTopicsEnabled flag in the Confluent.Consumer structure to TRUE. Plus, I verified that the kafka brokers can also accept Auto creation of topics. It just plain doesn't work. Many people are complaining about this very problem in version 1.5.3 and higher.

Recompiling Rebus.Kafka with 1.4.4 of Confluent's library seems to straighten everything out. According to Confluent, they turned off Auto Topic creation by default in the 1.5.3. But, apparently there is no way I've found to override that and turn it back on.

Auto topic creation is not a problem for production deploys since we want to manually create/adjust topics...but when it comes to developer workflows on isolated dev boxes, forcing devs to create topics seems like a step too far....the software should just do that as a reaction to working with Kafka.

Not sure if the solution is to go back to 1.4.4 in the nuget references of Rebus.Kafka...or just track that this is going to be an issue for people to be aware of until Confluent puts out a release that doesn't exhibit this problem.

glazkovalex commented 3 years ago

Dear, @mkoziel2000, I do not quite understand what exactly does not work, and I would very much like to ponet in which case the problem arises. I agree with you that a few years ago, the topic of the harm of auto-creating themes in production was actively pedaled. Here, for example, is "The Side Effect of Fetching Kafka Topic Metadata". I also agree that Confluent Cloud, not Apache Kafka installed on your server, but the cloud service Confluent Cloud has disabled the ability to enable auto.create.topics.enable on a number of its cloud services, leaving this option only on Dedicated clusters. However, this transport is mainly focused on using its own instance of Apache Kafka, and not someone's cloud service with limited capabilities. Transport tests use a script with its own private instance of Apache Kafka. An Apache Kafka instance is raised based on the latest "spotify/kafka" image that tests are being run on. To run the test, Docker Desktop must be installed.

Please tell me if you successfully pass the tests of the Rebus.Kafka transport? If so, please tell me, what kind of transport problem do you propose to solve?

Describe this case, preferably in the form of an additional test for Rebus.Kafka.

mkoziel2000 commented 3 years ago

Thanks on the heads up of running the unit tests to establish a baseline off of spotify/kafka. Unit tests do run successfully on that version of kafka. I am also using a baremetal version of kafka. In my case, its the version supported by Strimzi (https://strimzi.io/) since we ultimately want to be managing this in our Kubernetes environment along with all our other containers. In that version, I get blocked by the topic creation error when I run the same unit tests. My understanding is that Strimzi is also using the apache version and not Confluent. I guess this problem shifts to the platform that this connector interacts with rather than it being a connector issue per say. However, falling back to the older Confluent nuget library (1.4.4) does clear it all up when I run the unit tests so I may just travel down the road of a forked version of Rebus.Kafka to get past this issue for now.

I have not tried any of this with the Confluent managed provider. Am I to understand that this connector has not been tested with Confluents cloud offering...or its behavior is unknown? Going the MSP route with Confluent is something we are heavily considering so if there isn't any insight on that environment, I guess I'll have to add it to my due diligence. Thanks,

glazkovalex commented 3 years ago

Strimzi! Now I understand. I prefer to host only stateless services in Kubernetes. I did not place Apache Kafka in Kubernetes. I think for small and medium-sized Apache Kafka clusters that do not need high fault tolerance, Strimzi in Kubernetes will be a convenient solution. If the image of "spotify/kafka" works, and the image from Strimzi does not work, then Confluent.Kafka has not completely deteriorated :) Most likely, auto.create.topics.enable is disabled by default inside the image from Strimzi, just like in Confluent Cloud.

Although I did not place Apache Kafka in Kubernetes, I Googled about Kubernetes operators and Strimzi in particular. The article "Apache Kafka on Kubernetes with Strimzi – Part 1: Creating and Deploying a Strimzi Kafka Cluster" says that auto.create.topics.enable is disabled by default on the server, but you can enable it by setting it in the config: auto.create.topics. enable: "true"

Then the author writes: "As you may have notices, we have set the auto.create.topics.enable parameter to “true” inside our Kafka resource. So there’s no need to create a topic manually and the Topic Operator creates the KafkaTopic resource for us. But it doesn’t hurt to manage the topics ourselves and also it’s a best practice."

Try enabling auto.create.topics. enable: "true" in the Kafka deployment file. Maybe it will work...

mkoziel2000 commented 3 years ago

already played around with that property a bunch in Strimzi. I don't seem to experience this problem with other clients (such as debezium connectors or the standard kafka consumer cli). Topics get created on-the-fly just fine. It just seems to be this project using whatever Kafka Apis it uses in conjunction with the 1.5.3+ Confluent driver. It feels as though the behavior experienced suggest there might be one or two more parameters that need to get set up as part of the consumer setup. But not knowing the Confluent Api, I can't tell for sure without going down the road of coding up my own Kafka connector to figure it out. This is definitely strange now that I've seen it work using a different kafka instance.

glazkovalex commented 3 years ago

ready played around with that property a bunch in Strimzi

Thank you for the information. I will think about the possibility of using another client in the Rebus.Kafka transport. But a transport with an alternative client will not be soon. For now, you can use the old version of the package 1.4.3 Rebus.Kafka, which works well with Strimzi or fork and build a transport with the right set of packages.

glazkovalex commented 3 years ago

V 1.6.3 (1.04.2021) In the summer of 2020, the Librdkafka v1.5.0 library was updated, which was a change unexpected for many users of the Rebus.Kafka transport.

Consumer will no longer trigger auto creation of topics, allow.auto.create.topics=true may be used to re-enable the old deprecated functionality:

At the request of the transport users, I enabled the previous transport behavior by default. Now the Rebus.Kafka transport automatically creates topics by default as before. However, I do not recommend using allow.auto.create.topics=true for production! To disable allow.auto.create.topics, pass your ConsumerConfig or ConsumerAndBehaviorConfig configuration to the transport with the AllowAutoCreateTopics = false parameter disabled.

glazkovalex commented 11 months ago

@mkoziel2000, In version 2.0.0, transport forcibly creates missing topics if Consumer.Config.AllowAutoCreateTopics == true; However, I do not recommend using allow.auto.create.topics=true for production!

Also in version 2.0.0, the tests have been translated to the current and popular Apache Kafka docker container "confluentinc/cp-kafka:7.0.1"