giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Disable '"additionalProperties": false' in `values.yaml` schema for cluster apps. #3187

Open piontec opened 5 months ago

piontec commented 5 months ago

As we learned, some of our apps (so far we've found only some cluster apps) have in their values.schema.json file the property "additionalProperties": false, (example. This means, that if at any time someone will add a new global value to one of our catalog config maps or to the config repo and this value will be pushed to such an app, it won't install, as helm's validation (which can't be disabled if the schema file is present) will fail.

It seems that currently the recommended way is to allow additionalProperties and validate (and show a warning) before passing the config to helm.

CC @marians

erkanerol commented 5 months ago

In cluster-vsphere and cluster-cloud-director, we don't have additionalProperties:false in the root level but it seems we have it for some section of the schema such as internal.

Should we remove it completely from all levels? I am not fully sure.

AndiDog commented 5 months ago

This means we do no more checking of fields and customers + our engineers can easily make typos and mistakes. The schema becomes partially useless. People don't normally look at warnings, and CIs don't look at warnings at all. We also shouldn't add new, special validations for this case because then we're not using standard Helm commands anymore which may hinder our migration to pure Helm in the future.

Hence, here are some alternative suggestions:

erkanerol commented 5 months ago

When we set additionalProperties=false, we will not be able to catch typos or wrong keys if they are not marked as required.

When we keep additionalProperties=true, we have to upgrade all cluster-<provider> apps to the latest before releasing a change in the GitOps or an operator.

I believe the latter is worse than the former one. For example, when you need to add a new parameter to the configmap created by cluster-apps-operator, you don't want to have to upgrade all clusters in advance. That would slow us down a lot.

nprokopic commented 4 months ago

TL;DR We should make a clear distinction between configuration of "regular" apps (that deploy controllers/operators) and cluster-\<provider> (and default-apps-\<provider>) apps. They are used for different things and "global" configs (changed "globally") have quite different impacts. Universal config for "regular" apps (that deploy controllers/operators) should IMO not be universal for and shared with cluster-\<provider> (and default-apps-\<provider>) apps as well.

any time someone will add a new global value to one of our catalog config maps or to the config repo

When it comes to config being added to catalog config maps and to config repo, we should really make a clear distinction between "regular" apps (that deploy controllers/operators) and cluster-\<provider> apps that contain CAPI+CAPx+HelmRelease+App resources (and default-apps-\<provider> apps that go hand in hand with cluster-\<provider> apps and will be soon merged into cluster-\<provider> apps).

"Regular" apps, those that deploy controllers/operators:

On the other hand, cluster- apps (default-apps-\<provider> apps will get merged into cluster-):

Now when it comes to cluster-\<provider> apps schema:

This means we do no more checking of fields and customers + our engineers can easily make typos and mistakes. The schema becomes partially useless. People don't normally look at warnings, and CIs don't look at warnings at all. We also shouldn't add new, special validations for this case because then we're not using standard Helm commands anymore which may hinder our migration to pure Helm in the future.

💯 what Andreas wrote. JSON schema is natively integrated into Helm and this is very helpful. It gives us a strong validation and it's our 1st line of defence from incorrectly configured workload clusters. With JSON schema we can write simple Helm templates, because we can rely on the JSON schema validation and assume that Helm values that we are using in the template is just correct. Templates become easier to understand and maintain that way.

AFAIK cluster-\<provider> apps (and default-apps-\<provider>) get values injected from these places only:

So the only really useful/used config that gets injected into cluster-\<provider> apps and default-apps-\<provider> are base domain and management cluster name that get injected from cluster catalog config map, and both are immutable. So even this is more about defaulting some immutable value (for the sake of convenience I guess), and not about making something configurable.

When we keep additionalProperties=true, we have to upgrade all cluster- apps to the latest before releasing a change in the GitOps or an operator.

@erkanerol In these cases, which IMO should be very rare, AFAIK in cluster-apps-operator you can always check for cluster-\<provider> and default-apps-\<provider> version, and set the required values based on those versions, which IMO we should be even doing now, so we don't mutate existing workload clusters config with an MC component change. This problem existing here is IMO an architecture smell.

T-Kukawka commented 4 months ago

i am having hard to to conclude what shall we do here?

nprokopic commented 2 months ago

i am having hard to to conclude what shall we do here?

When it comes to cluster-\<provider> apps, IMO we should not do anything :)

To quote my TL;DR from above:

TL;DR We should make a clear distinction between configuration of "regular" apps (that deploy controllers/operators) and cluster-\<provider> (and default-apps-\<provider>) apps. They are used for different things and "global" configs (changed "globally") have quite different impacts. Universal config for "regular" apps (that deploy controllers/operators) should IMO not be universal for and shared with cluster-\<provider> (and default-apps-\<provider>) apps as well.