cloudfoundry-incubator / kubecf

Cloud Foundry on Kubernetes
Apache License 2.0
115 stars 62 forks source link

Ops file changes not being applied #1183

Open mook-as opened 4 years ago

mook-as commented 4 years ago

This bug is probably already fixed.

Describe the bug Upon applying an ops file (to update anti-affinity rules for pods), the changes were not being propagated to QuarksStatefulSets correctly.

This may have already been fixed; filing to schedule reproduction and documentation if necessary.

To Reproduce We had a deployment of KubeCF with pod anti-affinity rules set for diego-cell:

ops file ```yaml operations: inline: - type: replace path: /instance_groups/name=diego-cell/env?/bosh/agent/settings/affinity value: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: quarks.cloudfoundry.org/quarks-statefulset-name operator: In values: - diego-cell topologyKey: kubernetes.io/hostname weight: 100 ```

We changed it to requiredDuringSchedulingIgnoredDuringExecution (with appropriate fixes to drop the podAffinityTerm part).

Upon applying the changes, the ops files were merged into the config map (and the with-ops secret), but the change was not propagated into the QuarksStatefulSet.

Expected behavior The changes apply to the QuarksStatefulSet / StatefulSet / pods.

Environment

Additional context Was unable to reproduce this in master; it's possible that one of the operator bumps has fixed this already. This needs to be reproduced and documented if it's an actual issue.

gaktive commented 4 years ago

There may be a broader story here needed whereby if invalid parameters are provided to the operator, we need to see validation of the input. Aside from manual changes to values.yaml, an invalid variable can be passed to the operator and it will silently fail.

manno commented 4 years ago

Currently validation for the BOSH manifest is mostly about checking if all dependencies are present and if the with-ops manifest can be created: https://github.com/cloudfoundry-incubator/quarks-operator/blob/master/pkg/kube/controllers/boshdeployment/validating_webhook.go#L119

(Apparently there was once a problem with the update block, because only that is also checked.)

If we want to check for typos and unknown fields, I think the right way to go is to extend the existing https://github.com/SUSE/bosh-deployment-schema

Sadly that is quite dated and building a complete schema seems complicated, as we need to support all of BOSH and our extra fields (bpm, quarks). Then we have to include (copy) all the JSON schemas for k8s core objects like pods.

Finally there are string references (instance group names) which we'd need to check manually.