elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
53 stars 707 forks source link

podTemplate on ElasticSearch CRD is not validated #2256

Closed prometherion closed 4 years ago

prometherion commented 4 years ago

Bug Report

What did you do?

Wrongly read the documentation (my bad) and tried to create a cluster as follows:

apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: redacted
spec:
  version: 7.5.0
  nodeSets:
  - name: default
    count: 3
    podTemplate:
      tolerations:
      - key: sandbox.gke.io/runtime
        operator: Equal
        value: gvisor
    config:
      node.master: true
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false

What did you expect to see?

A validation exception but, obviously, this cannot happen since there is no validating webhook.

What did you see instead? Under which circumstances?

The cluster has been created, even tho without tolerations.

Environment

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

Not relevant.

prometherion commented 4 years ago

I see there's just a webhook regarding certificates and I know that this could be interpreted like a feature request: however, I saw that there are some issues with the podTemplate interpolation like #2207.

JFI are you using any specific framework like OperatorSDK?

sebgl commented 4 years ago

JFI are you using any specific framework like OperatorSDK?

We rely on Kubebuilder & controller-runtime. Which are now also the foundations of OperatorSDK.

podTemplate on ElasticSearch CRD is not validated

This is indeed a problem :(

Next release of the operator (1.0.0 GA) will include OpenAPI validation automatically enforced by the apiserver. This should do a basic validation of fields in the spec.

It also includes a validating webhook to cover more complicated cases such as an Elasticsearch cluster with no master nodes, for example.

This would however not be enough to catch the example above in the podTemplate. We discard the podTemplate validation from our CRD. The reason is kubectl apply -f <operator.yaml> fails because the resource stored in annotations becomes too big.

Support for structural schemas and pruning in CRDs should help detect cases where a field is misplaced, but I think would still silently ignore a misplaced field.

The validating webhook implementation available in controller-runtime does not easily give us access to the underlying raw JSON of an object: we get an already parsed version of the resource.

I still hope we can find a way to catch this while still having CRDs compatible with Kubernetes 1.11 and above.

prometherion commented 4 years ago

Just an idea: what about, instead, of implementing a map of JSONPatch6902 rules in order to edit the podTemplate as KinD is performing?

sebgl commented 4 years ago

@prometherion I'm not sure I understand how this would work in our case and how the user woud use it. Can you detail a bit more what you have in mind and give some examples?

(We currently use patchesJson6902 with kustomize to patch our CRDs)

prometherion commented 4 years ago

YMMV:

apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: redacted
spec:
  version: 7.5.0
  nodeSets:
  - name: default
    count: 3
  podTemplateJsonPatch6902:
  - op: add
    path: /spec/tolerations
    value:
    - key: sandbox.gke.io/runtime
      operator: Equal
      value: gvisor

Expected result will be the deploy of a StatefulSet with the given toleration block.

EDIT: this would require to apply the JSON patches every time a reconciliation loop is executed but this, from my end-user perspective, gives me more control on how to edit the Pod template of the StatefulSet resource.

david-kow commented 4 years ago

@prometherion FYI, current master (targeting 1.1.0) includes a validation webhook validation that can catch these issues if kubectl apply is used, as below.

k apply -f redacted.yaml
Error from server (Elasticsearch.elasticsearch.k8s.elastic.co "redacted" is invalid: tolerations: Invalid value: "tolerations": tolerations field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown): error when creating "redacted.yaml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: Elasticsearch.elasticsearch.k8s.elastic.co "redacted" is invalid: tolerations: Invalid value: "tolerations": tolerations field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown

Leaving this open as this won't validate changes coming through any path other than kubectl apply.

pebrc commented 4 years ago

I think we can close this, we have the annotation based validation on kubectl apply that @david-kow mentioned now and since 1.2.0 we also do semantic validation of the podTemplate on Kubernetes clusters >= 1.13 that support dry-run requests.