PHACDataHub / infra-core

GCP infrastructure configurations using flux, crossplane and backstage
MIT License
1 stars 1 forks source link

Template for provisioning GKE + Flux #18

Open vedantthapa opened 10 months ago

vedantthapa commented 10 months ago

A cluster resource by the crossplane gcp provider can be used in conjunction with an XProject and XNetwork resources to initialize a new project space with the desired networking setup and a GKE autopilot cluster.

Flux can be installed on the remote cluster by leveraging the kubeConfig exported by previously created cluster resource and, provider-kubernetes to apply management repo's flux sync manifest that has a kubeConfig reference.

This implies that the remote cluster's flux deployment will be in sync (managed) with the management cluster and the application team would be responsible for configuring GitOps on their repo i.e, a new GitRepository resource pointing to their repo. An upside to this is - we don't need worry about access or bootstrapping deploy keys on client / application repo.

In addition to this, the cluster can be added to the fleet-monitoring project to provide centralized monitoring of all clusters.

This would probably include a template on both crossplane and backstage side.

vedantthapa commented 10 months ago

For flux to apply a manifest on a remote cluster, a k8s secret with kubeConfig of the remote cluster is required.

The kubeConfig secret returned by the cluster resource via crossplane's gcp provider, by default, doesn't contain user config data. One way to approach this is by issuing client certificates during cluster creation, that would take care of authn and then use provider-kubernetes to configure authz for CN of the certificate. However, producing client certificates goes against GKE's security guidelines.

Another way is to "construct" a kubeConfig by using status.atProvider.endpoint and status.atProvider.masterAuth.clusterCaCertificate from the cluster resource and, passing these values at runtime to a k8s secret template which looks something like this (See point 5). Notice that this kubeConfig definition uses ExecConfig and gke-gcloud-auth-plugin (this is also how your local setup is configured, try doing cat ~/.kube/config). An upside to this is - it follows one of google's recommended ways of authenticating to the API server. However, flux requires the kubeConfig ref to be self-contained and not rely on binaries, environment, or credential files (See the note here). We can configure the gke-gcloud-auth-plugin on the kustomize-controller but that means either - every project has a common service account from the management project to control flux installation manifest's reconciliation (this breaks project boundaries) OR have multiple kustomize-controllers each with their own gke-gcloud-auth-plugin configuration to reconcile flux installation manifest (this would be too expensive).

To get around the concerns mentioned above, we can use a hybrid of the above two approaches i.e, use provider-kubernetes to create a new k8s service account, add a rolebinding, generate a token against that service account, write it back to the management cluster and finally pass it to the kubeconfig token field in the k8s secret template.

vedantthapa commented 10 months ago

An alternative is to do away with bootstrapping flux as per the above mentioned method and use provider-helm with community contributed flux charts, which then comes with the black box-y-ness nature of helm.

vedantthapa commented 10 months ago

So after some internal discussions, it seems like it'd be more useful to have flux reconciled against the client repository as opposed to the previously proposed approaches of having it reconciled to a central place. The reason for this is; anything beyond GCP resources should be considered as application specific tools. Having multiple flux instances from different remote clusters reconciling to a single flux repo increases the maintenance complexity for our team. Plus, it'd be difficult to "customize" flux deployment on a remote cluster if there's ever a need for that. For example, one project might need the helm-controller, whereas others won't.

The solution then is to have each flux instance reconcile to it's own client repo. However, as noted previously, that means configuring deploy keys. One way to approach this is by using a crossplane composition that would create the remote k8s cluster and then use provider-kubernetes to execute a one-time job that configures the deploy keys and bootstraps flux.

Fundamentally, this approach is similar to ANZ's Google Next demo. However, instead of github actions we'd use a kubernetes job within a crossplane composition. This is due to the security concerns around github's location. Moreover, the demo uses token auth, i.e, one highly privileged token would need to have access to all the repositories in the org and would be stored as a k8s secret in each cluster that uses this template.

On the other hand, a k8s job that's part of the crossplane composition can run a shell script that -

  1. Generates SSH keys
  2. Authenticates to github via an org-wide token and adds the deploy keys.
  3. Authenticates to the remote cluster and bootstraps flux with the flux bootstrap command

The job runs on the management cluster, therefore, the highly privileged org-wide token only lives in the management cluster.

The end result might looks something like this:

apiVersion: dip.phac.gc.ca/v1beta1
kind: XFluxGKE
metadata:
  name: alpha
spec:
  name: alpha # cluster name on gcp
  projectId: phsp-fb3a2b560a617fbf # project id where the cluster would be created
  xnetwork: # network config for the cluster
    networkId: projects/phsp-fb3a2b560a617fbf/global/networks/alpha-vpc
    subnetworkId: projects/phsp-fb3a2b560a617fbf/regions/northamerica-northeast1/subnetworks/alpha-vpc
  repoName: cpho-phase2 # repo name that resolves to ssh://git@github.com/PHACDataHub/<repoName>