Closed secondsun closed 3 years ago
The Kafka reference inside those beans is updated when I got a corresponding event in the controller so I would assume it will be the right one https://github.com/bf2fc6cc711aee1a0c2a/mk-agent/blob/main/agent-operator/src/main/java/org/bf2/operator/controllers/ManagedKafkaController.java#L75 or are you thinking about a different solution?
It is a race condition. If two requests come in at the same time then the second may overwrite the first Kafka reference between the time the first sets the reference and checks the status.
I got it now. Being application scope means that the bean will be just one so even if the controller observes creation/update for different ManagedKafka
resources, the KafkaCluster
bean in the KafkaInstance
will be just one so the race condition on two requests at the same time. I was using KafkaInstance
to track the 3 pieces which make Kafka (brokers, canary and admin server) even because the overall status depends on the three status. Your comment let me think that I should 1) create different KafkaInstance
(s) to track in the controller but it would make the controller more stateful and we have got a problem with restarts 2) reading every time from the API server to get status of stuff and not tracking inside the controller. Any thoughts?
1) create different KafkaInstance(s) to track in the controller but it would make the controller more stateful and we have got a problem with restarts
@ppatierno are you asking if a controller can have state?
well we avoided to have it in the Strimzi operator creating the kind of state for each reconcile from the ectd stored resource or what we observed via watches. I would say to do the same here but right now it's wrong. It has a kind of state and even shared across the different "requests". I am going to change it now.
@secondsun this https://github.com/bf2fc6cc711aee1a0c2a/kas-fleetshard/pull/51 should fix the issue. Beans are still application scoped but they have no "status" and the informer cache is used to get latest Kafka (or Deployment) resource when needed.
I was looking at the operand implementations, and I do not think they should be application scoped beans. In cases of https://github.com/bf2fc6cc711aee1a0c2a/mk-agent/blob/main/agent-operator/src/main/java/org/bf2/operator/operands/KafkaCluster.java#L33 the kafka reference may change between multiple calls to
createOrUpdate
and thusisError
and similar methods will not be guaranteed to be referencing the same instance.