Explore: KubeFed - Githubissues

cloudfoundry / cf-crd-explorations

Apache License 2.0

3 stars 2 forks source link

Explore: KubeFed #73

Closed gcapizzi closed 3 years ago

gcapizzi commented 3 years ago

As we try to design a mapping for orgs and spaces (see #69), we are trying to understand how to manage organisations that might not fit in a single cluster. Our current prototype consists of a controller connecting to multiple clusters, reconciling Namespaces, Roles and RoleBindings. This requires the controller to know about all the clusters that are part of the foundation and how to connect to them.

The Kubernetes SIG Multicluster is trying to solve a similar problem: providing a way to federate multiple clusters and reconcile resources across them. They have created a product called KubeFed:

Kubernetes Cluster Federation (KubeFed for short) allows you to coordinate the configuration of multiple Kubernetes clusters from a single set of APIs in a hosting cluster. KubeFed aims to provide mechanisms for expressing which clusters should have their configuration managed and what that configuration should be. The mechanisms that KubeFed provides are intentionally low-level, and intended to be foundational for more complex multicluster use cases such as deploying multi-geo applications and disaster recovery.

Let's take a look at KubeFed and see if it might be useful for our use case. Ideally we would be able to implement orgs and spaces for individual clusters, and then use KubeFed to extend foundations on multiple clusters.

mnitchev commented 3 years ago

We played a bit with kubefed and found the that it allows you to create resources accross multiple clusters with several options:

you can federate a resource and propagate it across all clusters (the same resource will be created on each cluster)
federated resources can be any kuberentes resource including crds, but they need to be enabled
you can generate a federated resource off of the yaml of any resource and then apply it with kubectl
you can only create federated resources in federated namespaces
federated resources can specify which clusters they need to be propagated to. This is done either by naming the specific cluster or with a label selector
resources like deployments can use the ReplicaSchedulingPreference to distribute load accross multiple clusters. 1 a ReplicaSchedulingPreference for every deployment. It also looks like it overrides the deployment's replica count.

We think that this tool could be used with eirini to distribute workloads accross multiple clusters and also to ensure isolation - for example an organization can be isolated on a deticated cluster, while the rest of the organizations can be placed in a shared (or multiple) cluster. We should explore the implication of federated clusters on our authorization (RBAC and OPA) concepts. NOTE: the kubefed product is currently in beta.

danail-branekov commented 3 years ago

KubeFed terms glossary: https://github.com/kubernetes-sigs/kubefed/blob/master/docs/concepts.md#kubefed-concepts

Our test setup consists of two kind clusters:

kubefed-1 (where kubefed is installed), we reference to it as a "host" cluster
kubefed-2 - it joins the "master" cluster, we reference it as a "member" cluster
ClusterRoles have federated representation FederatedClusterRole. We can easily deploy our regular rbac.yaml as federated cluster roles via kubefedctl federate --filename cf-roles-model/rbac.yaml | kubectl apply -f -:
- On the "host" cluster we have both FederatedClusterRoles and ClusterRoles created
- On the "member" cluster we have only the ClusterRoles that are mirrored from the "host" ClusterRole
- Changes in FederatedClusterRole is propagated automatically onto related ClusterRoles on both "host" and "member" clusters
Existing ClusterRoles can be promoted to federated via kubefedctl federate clusterrole cf-admin

We can similarly federate arbitrary objects (even the CF ones, such as App), for example RoleBindings:

kubectl apply -f alice-space-developer-rolebinding.yml to create a role binding on the "host cluster"
kubefedctl federate rolebinding alice-space-developer -n federate-me --enable-type to federate the role binding, doing that makes the role binding also appear on the "member" cluster

Each FederatedXXX object has placement field that specifies on which cluster the underlying non-federated object should appear. Options are:

The clusters are enumerated by name
The cluster are selected by a cluster label selector

danail-branekov commented 3 years ago

parking for now, we know how kubefed works, now we need to know whether it would be useful for us

georgethebeatle commented 3 years ago

Here are the results of some experiments we ran today. We played with the idea of having sub-eirini level federation vs having super-eirini level federation:

Can eirini federete statefulsets? (sub-eirini federation)
- No. As written in the docs the replicaschedulingpreference reconciler would only reconcile Deployments and ReplicaSets: https://github.com/kubernetes-sigs/kubefed/blob/master/docs/userguide.md#replicaschedulingpreference
- It is possible to federate statefulsets without using the replicaschedulingpreference and it results in each cluster having its own 0 instance, which might be the reason why replicaschedulingpreference does not support statefulsets.
- This means that federation on sub-eirini level is unfeasible unless eirini swithches to deployments
Can we somehow use replicaschedulingpreference for LRPs/Tasks? (super-eirini federation)
- Can we switch federation on and off by introducing a reconcliler that turns an LRP into a FederatedLRP?
  - This way neither the shim nor eirini will know about federation, but there will be a simple reconciler that just federates Tasks and LRPs. A problem with this approach might be that the app will momentarily appear on the federation host cluster before being scheduled on its destination cluster(s) which might be a security issue
- How are we going to work around the fact that LRP has an "instances" field while the kubefed reconcilers know how to put "replicas" on the object referred to by a replica scheduling preference?
  - Either migrate to replicas, or write our own "eirini scheduling preference" reconciler that heavily reuses the replicaschedulingpreference reconciler
- Is replicaschedulingpreference going to work with a modified LRP that has "replicas" instead of "instances"?
  - Yes. However the LRP needs to also have a selector (as in Deployments) since the replicaschedulingpreference expects this. Once we did that, we were able to federate a LRP with a preference of 6 instances an have them distributed around 2 clusters

danail-branekov commented 3 years ago

This way neither the shim nor eirini will know about federation, but there will be a simple reconciler that just federates Tasks and LRPs. A problem with this approach might be that the app will momentarily appear on the federation host cluster before being scheduled on its destination cluster(s) which might be a security issue

We can mitigate this by changing Eirini creation interface to return objects (that are not pushed to k8s). Then the federation bit could be just a wrapper (injected only when the federation switch is on) that transforms the statefulset/deployment object into a federated one and then apply it to k8s.

This is what kubefedctl federate --filename some-deployment.yml does, see here for reference. The federate command just transforms the yaml, pushing the object to k8s is taken care of by an upstream component.

georgethebeatle commented 3 years ago

We created a kubefed multi cluster prototype that features a federation of 3 clusters as follows:

cluster 1 is the host of the federation
a modified eirini is running on all clusters
there is a new LRP reconciler running only on the host cluster that would federate each LRP that is being created and would respect its isolationSegment label
there is one "private" isolation segment that consists of clusters 2 and 3 (the kubefedcluter objects on the host cluster are labeled with a label that says "isolationSegment": "private"
when an LRP appears on the host it is federated on the clusters that are labeled with its "isolationSegment" label
in our case the LRP object is removed from cluster 1 and appears on clusters 2 and 3
the federating reconciler also created a replicaschedulingpreference, setting the totalReplicas to the replica count of the LRP
Eventually the statefulsets are evenly distributed on the clusters constituting the isolation segment (in our example - 2 and 3)

One of the major goals of the prototype is to abstract federation away from CF components as much as possible. The ideal scenario is to have a single switch to turn federation on.

If you want to run the prototype you need the following branches:

While this prototype demonstrates how isolation segments might be implemented by kubefed it has several flaws

In the example above all instances of the app are briefly being run on the host cluster (because eirini reconciles the LRP before the federation takes place) which means that the isolation segment is being violated
In the example above the user creates an LRP on cluster1 which later on disappears, which is confusing and undesirable

We have updated the multicluster proposal with our latest findings and are closing this story for now