Multicluster-Controller

Multicluster-controller is a Go library for building Kubernetes controllers that need to watch resources in multiple clusters. It uses the best parts of controller-runtime (the library powering kubebuilder and now operator-sdk) and replaces its API (the manager, controller, reconcile, and handler packages) to support multicluster operations.

Why? Check out Admiralty's blog post introducing multicluster-controller.

How it Works
Getting Started
Configuration
Usage with Custom Resources
API Reference

How it Works

Here is a minimal multicluster controller that watches pods in two clusters. On pod events, it simply logs the pod's cluster name, namespace, and name. In a way, the only thing controlled by this controller is the standard output, but it illustrates a basic scaffold:

package main

import (
    "context"
    "log"

    "k8s.io/api/core/v1"
    "k8s.io/sample-controller/pkg/signals"

    "admiralty.io/multicluster-controller/pkg/cluster"
    "admiralty.io/multicluster-controller/pkg/controller"
    "admiralty.io/multicluster-controller/pkg/manager"
    "admiralty.io/multicluster-controller/pkg/reconcile"
    "admiralty.io/multicluster-service-account/pkg/config"
)

func main() {
    stopCh := signals.SetupSignalHandler()
    ctx, cancel := context.WithCancel(context.Background())
    go func() {
        <-stopCh
        cancel()
    }()

    co := controller.New(&reconciler{}, controller.Options{})

    contexts := [2]string{"cluster1", "cluster2"}
    for _, kubeCtx := range contexts {
        cfg, _, err := config.NamedConfigAndNamespace(kubeCtx)
        if err != nil {
            log.Fatal(err)
        }
        cl := cluster.New(kubeCtx, cfg, cluster.Options{})
        if err := co.WatchResourceReconcileObject(ctx, cl, &v1.Pod{}, controller.WatchOptions{}); err != nil {
            log.Fatal(err)
        }
    }

    m := manager.New()
    m.AddController(co)

    if err := m.Start(stopCh); err != nil {
        log.Fatal(err)
    }
}

type reconciler struct{}

func (r *reconciler) Reconcile(req reconcile.Request) (reconcile.Result, error) {
    log.Printf("%s / %s / %s", req.Context, req.Namespace, req.Name)
    return reconcile.Result{}, nil
}

Clusters have arbitrary names. Indeed, Kubernetes clusters are unaware of their names at the moment—apimachinery's ObjectMeta struct has a ClusterName field, but it "is not set anywhere right now and apiserver is going to ignore it if set in create or update request."
Clusters are configured using regular client-go rest.Config structs. They can be created, for example, from kubeconfig files or service account imports. We recommend using the config package of multicluster-service-account in either case.
A Cluster struct is created for each kubeconfig context and/or service account import. Clusters hold references to cluster-scoped dependencies: clients, caches, etc. (In controller-runtime, the Manager holds a unique set of those.)
A Controller struct is created, and configured to watch the Pod resource in each cluster. Internally, on each pod event, a reconcile Request, which consists of the cluster name, namespace, and name of the pod, is added to the Controller's workqueue.
Requests are to be processed asynchronously by the Controller's Reconciler, whose level-based logic is provided by the user (e.g., create controlled objects, call other services).
Finally, a Manager is created, and the Controller is added to it. In multicluster-controller, the Manager's only responsibilities are to start the Clusters' caches, wait for them to sync, then start the Controllers. (The Manager knows about the caches from the Controllers.)

Getting Started

A good way to get started with multicluster-controller is to run the helloworld example, which is more or less the controller presented above in How it Works. The other examples illustrate an actual reconciliation logic and the use of a custom resource. Look at their source code, change them to your needs, and refer to the API documentation as you go.

0. Requirements

You need at least two clusters and a kubeconfig file configured with two contexts, one for each of the clusters. If you already have two clusters/contexts set up, note the context names. In this guide, we use "cluster1" and "cluster2" as context names. (If your kubeconfig file contains more contexts/clusters/users, that's fine, they'll be ignored.)

Important: if your kubeconfig uses token-based authentication (e.g., GKE by default, or Azure with AD integration), make sure a valid (non-expired) token is cached before you continue. To refresh the tokens, run simple commands like:

kubectl cluster-info --context cluster1
kubectl cluster-info --context cluster2

Note: In production, you wouldn't use your user kubeconfig. Instead, we recommend multicluster-service-account.

If running the manager out-of-cluster, both clusters must be accessible from your machine; in-cluster, assuming you run the manager in cluster1, cluster2 must be accessible from cluster1, or if you run the manager in a third cluster, cluster1 and cluster2 must be accessible from cluster3.

(Optional) Creating Two Clusters on Google Kubernetes Engine

Assuming the gcloud CLI is installed, you're logged in, a default compute zone and project are set, and the Kubernetes Engine API is enabled in the project, here's a small script to create two clusters and rename their corresponding kubeconfig contexts "cluster1" and cluster2":

set -e
PROJECT=$(gcloud config get-value project)
REGION=$(gcloud config get-value compute/zone)
for NAME in cluster1 cluster2; do
  gcloud container clusters create $NAME
  gcloud container clusters get-credentials $NAME
  CONTEXT=gke_$PROJECT"_"$REGION"_"$NAME
  sed -i -e "s/$CONTEXT/$NAME/g" ~/.kube/config
  kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin \
    --user $(gcloud config get-value account)
  kubectl cluster-info # caches a token in kubeconfig
done

1. Running the Manager

You can run the manager either out-of-cluster or in-cluster.

Out-Of-Cluster

Build and run the manager from source:

go get admiralty.io/multicluster-controller
cd $GOPATH/src/admiralty.io/multicluster-controller
go run examples/helloworld/main.go --contexts cluster1,cluster2

Run some other pod from a second terminal, for example:

kubectl run nginx --image=nginx

In-Cluster

Save your kubeconfig file as a secret:

kubectl create secret generic kubeconfig \
  --from-file=config=$HOME/.kube/config

Then run a manager pod with the kubeconfig file mounted as a volume, and the KUBECONFIG environment variable set to its path:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: helloworld
spec:
  containers:
  - env:
    - name: KUBECONFIG
      value: /root/.kube/config
    image: quay.io/admiralty/multicluster-controller-example-helloworld
    name: manager
    args: ["--contexts", "cluster1,cluster2"]
    volumeMounts:
    - mountPath: /root/.kube
      name: kubeconfig
      readOnly: true
  volumes:
  - name: kubeconfig
    secret:
      secretName: kubeconfig
EOF

Run some other pod and check the logs:

kubectl run nginx --image=nginx
kubectl logs helloworld

If you cannot trust the pre-built image, you can build your own from source:

go get admiralty.io/multicluster-controller
cd $GOPATH/src/admiralty.io/multicluster-controller
docker build \
  --file examples/Dockerfile \
  --build-arg target=admiralty.io/multicluster-controller/examples/helloworld \
  --tag $IMAGE .

2. Understanding the Output

Here is a sample output, showing the system pods when the manager starts, followed by three lines for the nginx pod:

2018/10/11 18:53:52 cluster2 / kube-system / kube-dns-5dcfcbf5fb-89ngc
2018/10/11 18:53:52 cluster2 / kube-system / kube-proxy-gke-cluster4-default-pool-cd1af1fa-z5pn
...
2018/10/11 18:53:52 cluster1 / kube-system / kube-dns-autoscaler-69c5cbdcdd-bjn5x
2018/10/11 18:53:52 cluster1 / kube-system / fluentd-gcp-v2.0.17-q8g8x
...
2018/10/11 18:54:28 cluster2 / default / nginx-8586cf59-q59nb
2018/10/11 18:54:28 cluster2 / default / nginx-8586cf59-q59nb
2018/10/11 18:54:34 cluster2 / default / nginx-8586cf59-q59nb

When the cache synced, one reconcile request per pod was added to the controller's work queue. They were all different and all were processed. On the other hand, the nginx pod generated six events when it was created: Scheduled, SuccessfulMountVolume, Pulling, Pulled, Created, and Started; see for yourself by running:

kubectl describe pod nginx | tail -n 10

However, only three reconcile requests were processed. Indeed, the requests were all equal (same context, namespace, and name), so while the controller was processing one of them, several others were added to and grouped by the work queue before the controller could process another one (pod events can follow each other very quickly). That's normal and it illustrates the asynchronous and level-based characteristics of the controller pattern.

3. Further Examples

deploymentcopy

The deploymentcopy example filters events, watching only the default namespace. Also, it implements an actual reconciliation loop, following the common pattern illustrated in the figure below, where the controller object is an original Deployment in cluster1, and the controlled onject is a copy in cluster2.

controller logic

~~Note: Cross-cluster garbage collection is still in the works, so we must delete the controlled object when the controller has disappeared.~~ Cross-cluster garbage collection has been extracted into a reusable pattern, but we still need to update the examples.

cross-cluster garbage collection with finalizers

To run deploymentcopy out-of-cluster:

go run examples/deploymentcopy/cmd/manager/main.go cluster1 cluster2

podghost

The podghost example's reconciliation logic is similar to deploymentcopy's, but it creates PodGhost objects from Pods, where PodGhost is a custom resource (see below, Usage with Custom Resources).

The PodGhost custom resource definition (CRD) must be created in "cluster2" before running the manager:

kubectl create -f examples/podghost/kustomize/crd.yaml \
  --context cluster2

Then, out-of-cluster:

go run examples/podghost/cmd/manager/main.go cluster1 cluster2

Usage with Custom Resources

The Kubernetes controller pattern is often used in conjunction with custom resources. To use multicluster-controller with a custom resource, we need two things:

a custom resource definition (CRD), and
API code for the custom resource.

Custom Resource Definition

Reminder: CustomResourceDefinition is itself a Kubernetes resource. A custom resource's schema can be defined by creating a CRD object, which specifies an API group (e.g., multicluster.admiralty.io), version (e.g., v1alpha1), kind (e.g., PodGhost), OpenAPI validation rules, among other things.

There's nothing special about multicluster-controller in this regard. Just don't forget to create your CRDs in all of the clusters that need them.

Custom API Code

You need to define a struct for the custom resource (e.g., PodGhost), a corresponding List struct (e.g., PodGhostList), with proper json field tags, and DeepCopy methods. The structs must be registered with the scheme used for the Clusters at run time. Setting up those things manually is cumbersome and error-prone.

Luckily, there are tools to help us. You could copy-paste and modify the structs from sample-controller and use code-generator to generate the DeepCopy methods. You can also leverage the scaffolding of operator-sdk or kubebuilder.

In the end, don't forget to register the structs with the scheme, as in this snippet from the podghost example:

if err := apis.AddToScheme(ghostCluster.GetScheme()); err != nil {
  return nil, err
}

Using operator-sdk 0.0.7

operator-sdk new foo \
  --api-version multicluster.admiralty.io/v1alpha1 \
  --kind Foo

You would then rewrite cmd/foo/main.go and pkg/apis/stub/handler.go for multicluster-scheduler. Note that operator-sdk new creates a lot of other files that you may or may not need.

Using kubebuilder 1.0.4

We only care about the kubebuilder create api subcommand of kubebuilder, but unfortunately it requires files created by kubebuilder init, namely:

hack/boilerplate.go.txt, the copyright and license notice to use as a header in generated files,
and PROJECT, which contains metadata such as kubebuilder's major version number, the custom API's domain, and the package's import path.

You can either create only those two files (option 1) or run kubebuilder init and delete a bunch of files you don't need (option 2).

Option 1

echo '/*
Copyright 2018 The Multicluster-Controller Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/' > hack/boilerplate.go.txt

echo 'version: "1"
domain: admiralty.io
repo: admiralty.io/foo' > PROJECT

kubebuilder create api \
  --group multicluster \
  --version v1alpha1 \
  --kind Foo \
  --controller=false \
  --make=false

go generate ./pkg/apis # runs k8s.io/code-generator/cmd/deepcopy-gen/main.go

Option 2

kubebuilder init \
  --domain admiralty.io \
  --owner "The Multicluster-Controller Authors"

kubebuilder create api \
  --group multicluster \
  --version v1alpha1 \
  --kind Foo \
  --controller=false
  # calls make, which calls go generate

rm pkg/controller/controller.go
# and rewrite cmd/manager/main.go

API Reference

https://godoc.org/admiralty.io/multicluster-controller/

go get admiralty.io/multicluster-controller
godoc -http=:6060

then http://localhost:6060/pkg/admiralty.io/multicluster-controller/

admiraltyio / multicluster-controller

readme