bf2fc6cc711aee1a0c2a / kas-fleetshard

The kas-fleetshard-operator is responsible for provisioning and managing instances of kafka on a cluster. The kas-fleetshard-synchronizer synchronizes the state of a fleet shard with the kas-fleet-manager.
Apache License 2.0
7 stars 20 forks source link

Operands Shouldn't be application scoped #49

Closed secondsun closed 3 years ago

secondsun commented 3 years ago

I was looking at the operand implementations, and I do not think they should be application scoped beans. In cases of https://github.com/bf2fc6cc711aee1a0c2a/mk-agent/blob/main/agent-operator/src/main/java/org/bf2/operator/operands/KafkaCluster.java#L33 the kafka reference may change between multiple calls to createOrUpdate and thus isError and similar methods will not be guaranteed to be referencing the same instance.

ppatierno commented 3 years ago

The Kafka reference inside those beans is updated when I got a corresponding event in the controller so I would assume it will be the right one https://github.com/bf2fc6cc711aee1a0c2a/mk-agent/blob/main/agent-operator/src/main/java/org/bf2/operator/controllers/ManagedKafkaController.java#L75 or are you thinking about a different solution?

secondsun commented 3 years ago

It is a race condition. If two requests come in at the same time then the second may overwrite the first Kafka reference between the time the first sets the reference and checks the status.

ppatierno commented 3 years ago

I got it now. Being application scope means that the bean will be just one so even if the controller observes creation/update for different ManagedKafka resources, the KafkaCluster bean in the KafkaInstance will be just one so the race condition on two requests at the same time. I was using KafkaInstance to track the 3 pieces which make Kafka (brokers, canary and admin server) even because the overall status depends on the three status. Your comment let me think that I should 1) create different KafkaInstance(s) to track in the controller but it would make the controller more stateful and we have got a problem with restarts 2) reading every time from the API server to get status of stuff and not tracking inside the controller. Any thoughts?

rareddy commented 3 years ago

1) create different KafkaInstance(s) to track in the controller but it would make the controller more stateful and we have got a problem with restarts

@ppatierno are you asking if a controller can have state?

ppatierno commented 3 years ago

well we avoided to have it in the Strimzi operator creating the kind of state for each reconcile from the ectd stored resource or what we observed via watches. I would say to do the same here but right now it's wrong. It has a kind of state and even shared across the different "requests". I am going to change it now.

ppatierno commented 3 years ago

@secondsun this https://github.com/bf2fc6cc711aee1a0c2a/kas-fleetshard/pull/51 should fix the issue. Beans are still application scoped but they have no "status" and the informer cache is used to get latest Kafka (or Deployment) resource when needed.