giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Implement Releases RFC #3466

Closed nprokopic closed 3 months ago

nprokopic commented 4 months ago
### Tasks
- [x] Review and merge https://github.com/giantswarm/cluster/tree/releases-poc
- [x] Review and merge https://github.com/giantswarm/cluster-aws/tree/releases-poc
- [x] Review and merge https://github.com/giantswarm/app-admission-controller/tree/capi-releases-poc
- [ ] Adapt testing pipelines (Tinkerers) https://github.com/giantswarm/roadmap/issues/3473
- [ ] https://github.com/giantswarm/roadmap/issues/3475
- [x] Draft v25 release (Phoenix) ([demo one](https://github.com/giantswarm/releases/pull/1262))
- [ ] Adapt Renovate to new release ([mc-bootstrap](https://github.com/giantswarm/mc-bootstrap/pull/896) for example)
- [ ] https://github.com/giantswarm/management-cluster-bases/pull/146
### Tasks
nprokopic commented 4 months ago

1. What do we want here?

We want to be able to create, test and deliver releases quickly and efficiently, and that each team can own the entire workflow, from the development to testing to releasing.

Except from the RFC:

We deploy workload clusters by gluing together multiple components. Some of those are:

  • Provider-independent Cluster API resources (e.g. Cluster, MachineDeployment/MachinePool, etc.),
  • Provider-specific Cluster API resources (e.g. AWSCluster, AzureCluster, VSphereCluster, VCDCluster),
  • CPI implementation (aka provider-specific cloud controller manager),
  • CNI (e.g. Cilium),
  • CSI,
  • upstream apps that we package,
  • our apps that we develop and package,
  • provider-independent and provider-specific default configuration of apps,
  • configuration of the operating system and different node components, such as systemd, containerd, etc.

Multiple teams and multiple people are continuously working on all the above and it is indispensable to ensure that all of them have smooth and frictionless development, testing and release experience, so we can increase deployment frequency, reduce lead time for changes and reduce change failure rate. For this to work, we need to be able to develop, test and release almost every change independently of almost all other changes. The team that have worked on a change should be able to release the change fully independently, without any intervention from the provider-integration or provider-independent KaaS teams.

Here is how the releases look like https://github.com/giantswarm/releases/pull/1262.

They immutable like in vintage. Every new version of any app/components requires a new release.

v25 with Kubernetes v1.25

apiVersion: release.giantswarm.io/v1alpha1
kind: Release
metadata:
  name: v25.0.0-alpha.1
spec:
  apps:
  - name: aws-ebs-csi-driver
    version: 2.30.1
    dependsOn:
    - cloud-provider-aws
  - name: aws-pod-identity-webhook
    version: 1.14.2
    dependsOn:
    - cert-manager
  - name: capi-node-labeler
    version: 0.5.0
  - name: cert-exporter
    version: 2.9.0
    dependsOn:
    - kyverno
  - name: cert-manager
    version: 3.7.5
    dependsOn:
    - prometheus-operator-crd
  - name: chart-operator-extensions
    version: 1.1.2
    dependsOn:
    - prometheus-operator-crd
  - name: cilium
    version: 0.24.0
  - name: cilium-crossplane-resources
    version: 0.1.0
  - name: cilium-servicemonitors
    version: 0.1.2
    dependsOn:
    - prometheus-operator-crd
  - name: cloud-provider-aws
    version: 1.25.14-gs2
    dependsOn:
    - vertical-pod-autoscaler-crd
  - name: cluster-autoscaler
    version: 1.27.3-gs9
    dependsOn:
    - kyverno
  - name: coredns
    version: 1.21.0
  - name: external-dns
    version: 3.1.0
    dependsOn:
    - prometheus-operator-crd
  - name: metrics-server
    version: 2.4.2
    dependsOn:
    - kyverno
  - name: net-exporter
    version: 1.19.0
    dependsOn:
    - prometheus-operator-crd
  - name: network-policies
    version: 0.1.0
    catalog: cluster
  - name: node-exporter
    version: 1.19.0
    dependsOn:
    - kyverno
  - name: vertical-pod-autoscaler
    version: 5.1.0
    dependsOn:
    - prometheus-operator-crd
  - name: vertical-pod-autoscaler-crd
    version: 3.0.0
  - name: etcd-k8s-res-count-exporter
    version: 1.10.0
    dependsOn:
    - kyverno
  - name: observability-bundle
    version: 1.3.4
    dependsOn:
    - coredns
  - name: k8s-dns-node-cache
    version: 2.6.1
    dependsOn:
    - kyverno
  - name: security-bundle
    version: 1.6.5
    catalog: giantswarm
    dependsOn:
    - prometheus-operator-crd
  - name: teleport-kube-agent
    version: 0.9.0
  components:
  - name: cluster-aws
    version: 0.76.1-b76af2c26f4224ffb0d718e940e232fac05c89a0
  - name: flatcar
    version: 3815.2.2
  - name: flatcar-variant
    version: 1.0.0
  - name: kubernetes
    version: 1.25.16
  date: "2024-05-18T12:57:50Z"
  state: active

v26 with Kubernetes v1.26

apiVersion: release.giantswarm.io/v1alpha1
kind: Release
metadata:
  name: v26.0.0-alpha.1
spec:
  apps:
  - name: aws-ebs-csi-driver
    version: 2.30.1
    dependsOn:
    - cloud-provider-aws
  - name: aws-pod-identity-webhook
    version: 1.14.2
    dependsOn:
    - cert-manager
  - name: capi-node-labeler
    version: 0.5.0
  - name: cert-exporter
    version: 2.9.0
    dependsOn:
    - kyverno
  - name: cert-manager
    version: 3.7.5
    dependsOn:
    - prometheus-operator-crd
  - name: chart-operator-extensions
    version: 1.1.2
    dependsOn:
    - prometheus-operator-crd
  - name: cilium
    version: 0.24.0
  - name: cilium-crossplane-resources
    version: 0.1.0
  - name: cilium-servicemonitors
    version: 0.1.2
    dependsOn:
    - prometheus-operator-crd
  - name: cloud-provider-aws
    version: 1.26.11-gs.alpha.1
    dependsOn:
    - vertical-pod-autoscaler-crd
  - name: cluster-autoscaler
    version: 1.27.3-gs9
    dependsOn:
    - kyverno
  - name: coredns
    version: 1.21.0
  - name: external-dns
    version: 3.1.0
    dependsOn:
    - prometheus-operator-crd
  - name: metrics-server
    version: 2.4.2
    dependsOn:
    - kyverno
  - name: net-exporter
    version: 1.19.0
    dependsOn:
    - prometheus-operator-crd
  - name: network-policies
    version: 0.1.0
    catalog: cluster
  - name: node-exporter
    version: 1.19.0
    dependsOn:
    - kyverno
  - name: vertical-pod-autoscaler
    version: 5.2.2
    dependsOn:
    - prometheus-operator-crd
  - name: vertical-pod-autoscaler-crd
    version: 3.1.0
  - name: etcd-k8s-res-count-exporter
    version: 1.10.0
    dependsOn:
    - kyverno
  - name: observability-bundle
    version: 1.3.4
    dependsOn:
    - coredns
  - name: k8s-dns-node-cache
    version: 2.6.2
    dependsOn:
    - kyverno
  - name: security-bundle
    version: 1.6.5
    catalog: giantswarm
    dependsOn:
    - prometheus-operator-crd
  - name: teleport-kube-agent
    version: 0.9.0
  components:
  - name: cluster-aws
    version: 0.76.1-b76af2c26f4224ffb0d718e940e232fac05c89a0
  - name: flatcar
    version: 3815.2.2
  - name: flatcar-variant
    version: 1.0.0
  - name: kubernetes
    version: 1.26.15
  date: "2024-05-18T12:57:50Z"
  state: active

Releases are added to the releases repo via PRs. After the PRs are merged, CI job is pushing the releases to provider-specific app collections, so they end up on the MCs and we can see them there:

kubectl get release
NAME              KUBERNETES VERSION   STATE    AGE   READY   INUSE
v25.0.0-alpha.1   1.25.16              active   10d
v26.0.0-alpha.1   1.26.15              active   10d
v27.0.0-alpha.1   1.27.14              active   10d
nprokopic commented 4 months ago

2. What do we use this?

2.1. Deploying a cluster

cluster-$provider app manifest looks like this:

---
apiVersion: v1
data:
  values: |
    global:
      release:
        version: 25.0.0-alpha.1
      connectivity:
        availabilityZoneUsageLimit: 3
      metadata:
        description: Releases POC v25
        name: v25nik02
        organization: nikola
        annotations:
            alpha.giantswarm.io/ignore-cluster-deletion: "true"
      nodePools:
        nodepool0:
          instanceType: m5.xlarge
          maxSize: 10
          minSize: 3
          rootVolumeSizeGB: 8
kind: ConfigMap
metadata:
  creationTimestamp: null
  labels:
    giantswarm.io/cluster: v25nik02
  name: v25nik02-userconfig
  namespace: org-nikola
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
  labels:
    app-operator.giantswarm.io/version: 0.0.0
  name: v25nik02
  namespace: org-nikola
spec:
  catalog: cluster-test
  kubeConfig:
    inCluster: true
  name: cluster-aws
  namespace: org-nikola
  userConfig:
    configMap:
      name: v25nik02-userconfig
      namespace: org-nikola
  version: ""

Release version is set in user Helm values, see this part in the ConfigMap above:

apiVersion: v1
data:
  values: |
    global:
      release:
        version: 25.0.0-alpha.1
# ...

App version is left empty:

apiVersion: application.giantswarm.io/v1alpha1
kind: App
# ...
spec:
  version: ""

2.2. Upgrading a cluster

Cluster upgrade is done by updating release version, e.g. from this:

---
apiVersion: v1
data:
  values: |
    global:
      release:
        version: 25.0.0-alpha.1
# ...

To this:

---
apiVersion: v1
data:
  values: |
    global:
      release:
        version: 26.0.0-alpha.1
# ...

This will trigger cluster-$app re-rendering and new versions from the new release will get applied.

2.3. Upgrade from "standalone" cluster-aws (without Release CRs) to a new release

Upgrade is done by removing App.spec.version and setting release version in Helm values.

So change from this:

apiVersion: application.giantswarm.io/v1alpha1
kind: App
# ...
spec:
  version: 0.76.1

to this:

---
apiVersion: v1
data:
  values: |
    global:
      release:
        version: 25.0.0-alpha.1
# ...
apiVersion: application.giantswarm.io/v1alpha1
kind: App
# ...
spec:
  version: ""
nprokopic commented 3 months ago

Closing this on favour of https://github.com/giantswarm/roadmap/issues/3473 and https://github.com/giantswarm/roadmap/issues/3475