Open marshtompsxd opened 1 year ago
Scheduler(not shown in the above figure), which maintains a queue of all the reconcile requests. The scheduler deduplicates reconcile requests: if a reconcile request with CR object o comes while there is another reconcile request with the same CR object exists in the queue, the request that is scheduled later will be discarded and the other one remains in (or, gets added to) the queue.
Interesting. So if two notifications for an object o
happen for versions o1
followed by o2
while the reconcile for o1
has not yet happened, then reconcile()
will only be called with the argument o1
and not o2
?
Yes. But the best practice is to always first call k8s_client.Get(o.key)
when entering reconcile() so that you don't make decisions based on the older version of o
(tho we know this doesn't completely prevent staleness issues...)
Indeed. Even calling Get(o.key)
still leaves open the possibility for time-of-check/time-of-use bugs.
We plan to build the controller implementation on top of the kube-rs library: https://github.com/kube-rs/kube.
To implement a controller using kube-rs, the developer needs to provide three things.
async fn reconcile(Arc<CR>, Arc<Ctx>) -> Result<Action, Error>
Reconcile function
Each controller program can have multiple reconcile functions, and each reconcile function only manages one CR (custom resource) type. The controller monitors all the instances (objects) of the CR type and calls the reconcile function to make sure the actual cluster state matches the desired state described in the CR object. The reconcile function is only called when the cluster state related to some CR object is changed (see the triggering conditions below).
The first argument of the reconcile function points to the related CR object, and the second usually points to some context data provided by the developer (e.g., the client to talk to Kubernetes API). Note that the reconcile function does not know exactly what event triggers it. Instead, it only knows that the cluster state related to the CR object just got changed so it needs to take some action to reconcile the cluster state again. This pattern is also called level-triggering.
The reconcile function usually scans the objects related to (or, owned by) the triggering CR object, checks whether they match the desired state, and issues corrective updates if not.
The reconcile function returns a
Result
object which can beAction
orError
. There are two types ofAction
: (1) requeue the reconcile for the same object, which means reconcile will be called with the same argument in X seconds, or (2) wait for the next triggering event. When the error is returned, the developer-specified error policy will decide the next action, which usually requeues the reconcile function.Triggering conditions
The triggering conditions decide when the reconcile function is called. There are four types of triggering conditions:
The first condition is compulsory and the others are optional. Most controllers will choose condition 2 for all the objects created by the controller.
Error policy
The error policy is usually simple: just requeue the reconcile.
Important components of kube-rs framework
The kube-rs framework does a lot of work in the background to make it easy to write controller programs. The figure below shows the interaction between kube-rs, reconcile function and Kubernetes API:
At a high level, kube-rs (1) sets up watchers that keep monitoring changes to different types of objects in Kubernetes API, (2) filters out irrelevant events according to the triggering conditions, and (3) invokes the reconcile function with the related CR object. kube-rs has several important components:
o
comes while there is another reconcile request with the same CR object exists in the queue, the request that is scheduled later will be discarded and the other one remains in (or, gets added to) the queue.Watcher
Watcher mainly does two things: (1) issues an initial list to the Kubernetes API for a particular type of resources and (2) continuously watches all the following changes on the resource from the Kubernetes API. Each controller sets up a watcher for each resource type it needs to watch for (including the custom resource type).
More concretely, each watcher starts a future stream by runing a state machine in step_trampolined: it starts from state empty, and issues a list for a particular resource type to the Kubernetes API; after the list succeeds, it will start to watch for any changes to objects of this resource type. Every object read by the list or watch will be sent to the future stream.
Every time when calling new to construct a controller or calling owns to set up more triggers, a watcher will be set up for watching the particular type of resource objects. All the watchers of each controller will be grouped into one trigger_selector. Later when run is called, an applier is constructed and the trigger_selector (as a queue) is used to feed the stream of objects to the runner and scheduler which trigger the reconcile function with the resource object later.