cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
299 stars 115 forks source link

Certificate-based Identity for Apps Proposal (the feature formally known as Route Intregrity) #368

Open tcdowney opened 4 years ago

tcdowney commented 4 years ago

Background

Cloud Foundry for VMs uses TLS certificates to prevent misrouting -- a feature we colloquially refer to as Route Integrity. This protects against the case where the Ingress Gateway’s internal routing table is out of date by first validating the destination’s identity. If there is a certificate mismatch the request will fail and traffic will not be delivered to the wrong destination.

Istio can do this as part of mesh mutual TLS (mTLS), but only if each app has a unique certificate. Istio uses the service account that is assigned to a Pod to identify a workload and uses this to determine the SAN on the workload's X.509 certificate that it uses for mTLS. So, in short, we need to have a distinct service account per Cloud Foundry app.

Proposal

We will write a mutating webhook that will bind app-unique service accounts for each app statefulset and a controller that owns the creation of the service accounts/associated resources. This design draws inspiration from the Knative Binding pattern (viewing requires joining the knative-users google group). More specifically, this includes building two parts: a mutating admission webhook and a ServiceAccount/RoleBinding reconciling controller (we need to bind to the existing Eirini workload role and potentially others in the future).

The webhook will be responsible for altering the app StatefulSets and app task Jobs to inject our new per-app service account. The controller will be responsible for reconciling the new per-app service accounts injected by the web hook, as well as binding the appropriate roles to the new service account.

The webhook and controller will be configured via a ConfigMap. It will define what “Pod Spec”-able (things with a template that declares a pod.spec) resources our webhook will mutate. It will also give the controller the information it needs to know which “PodSpec”-able resources to watch for so that it can create the necessary ServiceAccounts. Additionally we declare what (Cluster)RoleBindings should be created for the new ServiceAccounts.

Implications

Scaling Concerns

In terms of scaling this solution, there is a risk of write amplification on etcd. Creating a new service account results in a number of second-order effects that put additional load on the platform -- things like JWT token creation/rotation. These increase the amount of data stored in etcd as well as the number of writes. We will have to deploy this with our scale tests to really find out whether it will cause problems at CF4K8s GA scale.

Example ConfigMap

The configuration for our components will look something like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: service-account-controller-config
  namespace: cf-system
data:
  service-account-controller.yaml:
    selectors:
    - matchLabels:
        cloudfoundry.org/source_type: APP
      namespace: cf-workloads
    - matchLabels:
        cloudfoundry.org/source_type: TASK
      namespace: cf-workloads
    - matchLabels:
        # Staging tasks are not actually pod-specable
        # kpack does not give control of the pod template
        # If we want to support identity for staging tasks we may have to mutate the pod and investigate how this affects kpack
        cloudfoundry.org/source_type: STG
        namespace: cf-workloads-staging
    roles:
    # https://github.com/cloudfoundry-incubator/eirini-release/blob/master/deploy/workloads/workloads-rbac.yml
    - "eirini-workloads-app-role"
    - "foo-role"
    cluster_roles:
    - "bar-role"

Describe alternatives you've considered

The majority of this exploration was completed as part of #170777302. You can read our Google doc to see the discussion and some of the alternatives we considered.

Additional context

cc/ @cloudfoundry/cf-networking @cloudfoundry/eirini

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174282729

The labels on this github issue will be updated when the story is started.

shalako commented 4 years ago

As the design discussion has taken place in the google doc linked above, this issue represents an agreed upon design proposal which will be implemented as soon as priorities permit.

evan2645 commented 4 years ago

👋 noob question here: what is the motivation for tying app identity to a k8s service account? My assumption is that the integration is simpler that way?

One observation I've made is that tightly coupling the notion of a service account to app identity often leads to interop challenges when communicating with off-cluster services where service account isn't a thing. The SA assumption tends to run deep, into authorization systems etc.

tcdowney commented 4 years ago

👋 noob question here: what is the motivation for tying app identity to a k8s service account? My assumption is that the integration is simpler that way?

@evan2645 yeah it's mainly because we're using Istio and service accounts are how it is assigning identity (relevant docs).

I've moved to a different team since sharing this out here, so I'll defer to @ndhanushkodi and @rosenhouse for the team's latest thinking on this.