giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Propose a schema of universal values.yaml options to be used and enforced across all apps #3185

Closed piontec closed 4 months ago

piontec commented 9 months ago

We need to come up with a defined schema for configuration values that are shared across multiple/all apps. A good example of the problem we have right now is the configuration of the image registry URL, which is now configured in at least 6 different ways (ticket).

piontec commented 9 months ago

Assumptions

Proposed schema

Legend:

# This version was invalidated, see below comments for updated version
marians commented 6 months ago

Regarding JSON schema dereferencing: I think we build this into schemalint. Despite the name, schemalint already has a helper command normalize. I am pretty sure the code for this is already in place in the tool. I could well imagine another command deref only for that purpose.

Apart from that, I have looked for dedicated tools, but found nothing.

uvegla commented 6 months ago
piontec commented 6 months ago
  • How about namespacing all our values under giantswarm? Not all charts are managed via sub charts and this way we can avoid all conflicts / confusion of stuff coming from upstream and how we define them.

I was thinking about that, and even started like that. Still, this felt wrong: this is a generic setting, not something GS specific (the value of registry is, the key name - no). But even worse, it started to create a confusion: you get something like image from upstream, but your chart actually uses (or at least should) use giantswarm.images - and suddenly it's hard to tell which image is really used in the templates and you have to check their source to check. And that's why I decided I don't want to prefix with giantswarm.

  • I get images. registry cos ideally it should be same for all. How about supporting it per image instead? It would still make it easy for an automation later to update all of them lets say, but gives more flexibility. Also an option if a per image one is not defnied, then use the one under images. For that tho I prefere 1 way of doing things instead of implementing a fallback in all charts.

But if we make it per-image, then it will be very hard to override a global value. Maybe let's just add optional override per image and use the top one as default. Then only charts which need that override (very rare case) will implement it (and carefully, as it won't be handled by the global top level value). WDYT?

piontec commented 6 months ago

Problems and issues:

piontec commented 6 months ago

v20240429-1

Schema

### All keys here placed under "global", so they are available to sub-charts as well
global:
  # ###
  # Mandatory well-known top level keys - have to be present and have this structure
  # We use them to drive region/MC/WC specific settings for multiple charts
  # from a single source of configuration.
  # ###
  images:
    registry: [gsoci.azurecr.io]
    "<imagePullSecrets>":
      - [SecretName]
    "[main]":
      image: giantswarm/[image]
      tag: [TAG]
      "<pullPolicy>": [IfNotPresent]
      "<registry>":
        [gsociprivate.azurecr.io] # do that only if you want to override the default;
        # this won't be managed by external global config settings
    "[alpine]":
      image: giantswarm/alpine
      tag: "3.18"
      "<pullPolicy>": [IfNotPresent]

  # ###
  # Optional well-known top level keys - they don't have to be present, but if they are,
  # they have to have this structure.
  # We use them to drive region/MC/WC specific settings for multiple charts
  # from a single source of configuration.
  # ###

  podSecurityStandards:
    enforced: false

  ###
  # Optional keys - they are not used to enforce common settings, but to keep most popular settings
  # in sync, so we have consistency when working on charts. These values won't be set for multiple charts
  # at the same time, like from CCRs, but we still want to have the settings consitent, if used.
  # ###

  verticalPodAutoscaler:
    enabled: true

  podDisruptionBudget:
    enabled: false

  crds:
   install: true

  # defined for the main pod (default), then for each pod with different requirements by pod's name
  resources:
    default:
      "<requests>":
        cpu: 500m
        memory: 512Mi
      "<limits>":
        cpu: 1000m
        memory: 1024Mi
    "[alpine]":
      "<requests>":
        cpu: 500m
        memory: 512Mi
      "<limits>":
        cpu: 1000m
        memory: 1024Mi

  # defined for the main pod (default), then for each pod with different requirements by pod's name
  tolerations:
    default: []
    "[alpine]": []
  nodeSelector:
    default: {}
    "[alpine]": {}
  affinity:
    default: {}
    "[alpine]": {}

  podSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
    "[alpine]":
      runAsNonRoot: false
  containerSecurityContext:
    default:
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
    "[alpine]":
      runAsNonRoot: false

Examples

---
#### Simple pod with 1 container and security policies
global:
  images:
    registry: gsoci.azurecr.io
    zot:
      image: giantswarm/zot-linux-amd64
      tag: "2.3.4"

  podSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
  containerSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      allowPrivilegeEscalation: false

---
#### 1 pod with 2 containers, one of them coming from a private registry, and extra options
global:
  images:
    registry: gsoci.azurecr.io
    # there's only 1 pod and pull secrets are defined on the pod level, so we can use the default
    imagePullSecrets:
      - gsociprivate-pull-secret
    zot:
      image: giantswarm/zot-linux-amd64
      tag: "2.3.4"
    secret-injector:
      image: giantswarm/super-secret-injector
      tag: "3.18"
      registry: gsociprivate.azurecr.io

  resources:
    default:
      requests:
        cpu: 2000m
        memory: 1024Mi
      limits:
        cpu: 2000m
        memory: 1024Mi
    alpine:
      requests:
        cpu: 500m
        memory: 128Mi
      limits:
        cpu: 1000m
        memory: 128Mi

  podSecurityContext:
    default:
      runAsNonRoot: true
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
  containerSecurityContexts:
    default:
      runAsNonRoot: true
      runAsGroup: 2000
    alpine:
      runAsNonRoot: true
      allowPrivilegeEscalation: false

---
#### Sub-charts with many pods, containers and extra options
global:
  images:
    registry: gsoci.azurecr.io
    zot:
      image: giantswarm/zot-linux-amd64
      tag: "2.3.4"
    secret-injector:
      image: giantswarm/super-secret-injector
      tag: "3.18"
      registry: gsociprivate.azurecr.io
      # only some (not all) pods need a pull secret to get this image
      imagePullSecrets:
        - gsociprivate-pull-secret
    postgres:
      image: giantswarm/postgres
      tag: "10.11.2"
    minio:
      image: giantswarm/minio
      tag: "3.4.5"

  verticalPodAutoscaler:
    enabled: true
  podSecurityStandards:
    enforced: true
  podDisruptionBudget:
    enabled: true
  crds:
    install: true

  # same tolerations for all the pods
  tolerations:
    default:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
    "postgres":
      - key: "storage-backend/superfast"
        operator: "Exists"
        effect: "NoSchedule"
uvegla commented 5 months ago

Just a minor thing, but the read / usability of the values file takes precedence I think. On nodes where there there are component names like [main], [alpine], etc. it might be simpler to validate / programatically update them if they are separated under a key, may it be however named. Otherwise it is not straightforward to know which nodes are component nodes are supposed to have e.g. image and tag if other object type fields are possible. Let's say:

global:
  images:
    registry: gsoci.azurecr.io
    kustomize-controller:
      image: giantswarm/kustomize-controller
      tag: v1.0.1
   source-controller:
     image: giantswarm/fluxcd-source-controller
     tag: v1.0.1
   more:
     properties:
       a: 1
       b: 2

From a validation tools standpoint: is more a container and should have image and tag?

JosephSalisbury commented 5 months ago

yo @giantswarm/team-turtles we had some discussion in SIG Architecture Sync about whether it would make sense for cluster charts to align with this schema - we reckoned it probably didn't make sense, but thought we'd ping you anyway to get your opinion <3 <3 <3 <3

piontec commented 5 months ago

@uvegla can you please give an exmaple here? I'm not sure what you mean?

uvegla commented 5 months ago

@piontec Fixed the indentation. Meaning if we want a tool to validate the certain nodes under images have image and tag properties set, how do you know, which ones should have if the component names can be anything like kustomize-controller or source-controller, but more in the above example is not.

piontec commented 4 months ago

v20240604-1

Schema

### All keys here placed under "global", so they are available to sub-charts as well
global:
  # ###
  # Mandatory well-known top level keys - have to be present and have this structure
  # We use them to drive region/MC/WC specific settings for multiple charts
  # from a single source of configuration.
  # ###
  images_info:
    registry: [gsoci.azurecr.io]
    "<imagePullSecrets>":
      - [SecretName]
    images:
      "[main]":
        image: giantswarm/[image]
        tag: [TAG]
        "<pullPolicy>": [IfNotPresent]
        "<registry>": [gsociprivate.azurecr.io] # only if you want to override the 'images_info:" default;
        # this value won't be managed by external global config settings (ie. catalog config maps)
        "<imagePullSecrets>": # only if you want to override the 'images_info:" default;
          - [gsociprivate-pull-secret]

      "[alpine]":
        image: giantswarm/alpine
        tag: "3.18"
        "<pullPolicy>": [IfNotPresent]

  # ###
  # Optional well-known top level keys - they don't have to be present, but if they are,
  # they have to have this structure.
  # We use them to drive region/MC/WC specific settings for multiple charts
  # from a single source of configuration.
  # ###

  podSecurityStandards:
    enforced: false

  ###
  # Optional keys - they are not used to enforce common settings, but to keep most popular settings
  # in sync, so we have consistency when working on charts. These values won't be set for multiple charts
  # at the same time, like from CCRs, but we still want to have the settings consitent, if used.
  # ###

  verticalPodAutoscaler:
    enabled: true

  podDisruptionBudget:
    enabled: false

  crds:
   install: true

  # defined for the main pod (default), then for each pod with different requirements by pod's name
  resources:
    default:
      "<requests>":
        cpu: 500m
        memory: 512Mi
      "<limits>":
        cpu: 1000m
        memory: 1024Mi
    "[alpine]":
      "<requests>":
        cpu: 500m
        memory: 512Mi
      "<limits>":
        cpu: 1000m
        memory: 1024Mi

  # defined for the main pod (default), then for each pod with different requirements by pod's name
  tolerations:
    default: []
    "[alpine]": []
  nodeSelector:
    default: {}
    "[alpine]": {}
  affinity:
    default: {}
    "[alpine]": {}

  podSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
    "[alpine]":
      runAsNonRoot: false
  containerSecurityContext:
    default:
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
    "[alpine]":
      runAsNonRoot: false

Examples

---
#### Simple pod with 1 container and security policies
global:
  images_info:
    registry: gsoci.azurecr.io
    images:
      zot:
        image: giantswarm/zot-linux-amd64
        tag: "2.3.4"

  podSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
  containerSecurityContext:
    default:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      allowPrivilegeEscalation: false

---
#### 1 pod with 2 containers, one of them coming from a private registry, and extra options
global:
  images_info:
    registry: gsoci.azurecr.io
    # there's only 1 pod and pull secrets are defined on the pod level, so we can use the default
    imagePullSecrets:
      - gsociprivate-pull-secret
    images:
      zot:
        image: giantswarm/zot-linux-amd64
        tag: "2.3.4"
      secret-injector:
        image: giantswarm/super-secret-injector
        tag: "3.18"
        registry: gsociprivate.azurecr.io

  resources:
    default:
      requests:
        cpu: 2000m
        memory: 1024Mi
      limits:
        cpu: 2000m
        memory: 1024Mi
    alpine:
      requests:
        cpu: 500m
        memory: 128Mi
      limits:
        cpu: 1000m
        memory: 128Mi

  podSecurityContext:
    default:
      runAsNonRoot: true
      fsGroup: 1000
      fsGroupChangePolicy: "OnRootMismatch"
  containerSecurityContexts:
    default:
      runAsNonRoot: true
      runAsGroup: 2000
    alpine:
      runAsNonRoot: true
      allowPrivilegeEscalation: false

---
#### Sub-charts with many pods, containers and extra options
global:
  images_info:
    registry: gsoci.azurecr.io
    images:
      zot:
        image: giantswarm/zot-linux-amd64
        tag: "2.3.4"
      secret-injector:
        image: giantswarm/super-secret-injector
        tag: "3.18"
        registry: gsociprivate.azurecr.io
        # only some (not all) pods need a pull secret to get this image
        imagePullSecrets:
          - gsociprivate-pull-secret
      postgres:
        image: giantswarm/postgres
        tag: "10.11.2"
      minio:
        image: giantswarm/minio
        tag: "3.4.5"

  verticalPodAutoscaler:
    enabled: true
  podSecurityStandards:
    enforced: true
  podDisruptionBudget:
    enabled: true
  crds:
    install: true

  # same tolerations for all the pods
  tolerations:
    default:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
    "postgres":
      - key: "storage-backend/superfast"
        operator: "Exists"
        effect: "NoSchedule"
mproffitt commented 4 months ago

Closing as done