Closed thbkrkr closed 3 years ago
I did some more tests and although kubectl explain es
fails,kubectl apply -f quickstart.yml
works if kubectl <= 1.16
.
I just tried to update from 1.5.o -> 1.7.0 and ran into this issue on a 1.19 cluster.
In my case I ended up with not being able to update all kinds of objects (totally unrelated) because of the validation problem.
Hi @travisghansen , can you please share what's your kubectl version? (kubectl version
command output)
Yeah, v1.21 and then whatever is bundled with argocd 2.0.5.
The same issue.
kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:56:19Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-eks-087e67", GitCommit:"087e67e479962798594218dc6d99923f410c145e", GitTreeState:"clean", BuildDate:"2021-07-31T01:39:55Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
@travisghansen I cannot reproduce the issue on a 1.19 cluster with a v1.21 kubectl client.
Our Elasticsearch CRD has an array type without subelements
kubectl get --raw /openapi/v2 | jq '.definitions["co.elastic.k8s.elasticsearch.v1.Elasticsearch"].properties.spec.properties.nodeSets.items.properties.volumeClaimTemplates'
{
"description": "VolumeClaimTemplates is a list of persistent volume claims to be used by each Pod in this NodeSet. Every claim in this list must have a matching volumeMount in one of the containers defined in the PodTemplate. Items defined here take precedence over any default claims added by the operator with the same name.",
"type": "array",
"x-kubernetes-preserve-unknown-fields": true
}
Which is totally fine according to
https://github.com/kubernetes/kube-openapi/blob/1a6458611d189dc17e98a0824dc92536365efedf/pkg/util/proto/document.go#L214-L218
Note the TODO(wrong)
comments. However kubectl
which uses the kube-openapi library cannot deal with it as these TODOs have not been addressed.
if len(s.GetItems().GetSchema()) != 1 {
// TODO(wrong): Items can have multiple elements. We can ignore Items then (would be incomplete), but we cannot return an error.
// TODO(wrong): "type: array" witohut any items at all is completely valid.
return nil, newSchemaError(path, "array should have exactly one sub-item")
}
Kubernetes seems to have "fixed" this issue in later versions server-side by changing the data returned by the OpenAPI endpoint that kubectl uses for validation. Specifically they are pruning the type
attribute from the response.
For comparison purposes the same command on K8s v1.19:
kubectl get --raw /openapi/v2 | jq '.definitions["co.elastic.k8s.elasticsearch.v1.Elasticsearch"].properties.spec.properties.nodeSets.items.properties.volumeClaimTemplates'
{
"description": "VolumeClaimTemplates is a list of persistent volume claims to be used by each Pod in this NodeSet. Every claim in this list must have a matching volumeMount in one of the containers defined in the PodTemplate. Items defined here take precedence over any default claims added by the operator with the same name.",
"x-kubernetes-preserve-unknown-fields": true
}
I believe these are the two relevant PRs (v1.18 and v1.19) in Kubernetes.
I don't see a way at the moment to tweak our CRDs to avoid the issue other than not using structural schema but we are already offering a non-structural version as *-legacy.yaml
manifests.
@travisghansen I cannot reproduce the issue on a 1.19 cluster with a v1.21 kubectl client.
Try to reproduce it on 1.17 k8s cluster. This is the minimal supported version but still supported :)
I don’t think it’s an issue with the crd itself but the validation webhook.
I’m not sure if it’s relevant but the clusterroles I think may all have had permissions to crds…not specific crds per-se, just the ability to read/whatever crds generally.
don’t think it’s an issue with the crd itself but the validation webhook.
The root cause is still in the CRD. I updated my original post to add a bit more explanation of my current understanding. Tweaking the clusterroles to be less restrictive might avoid the error at installation time but you still run into it every time you want to deploy an Elasticsearch cluster or run any other kubectl
operation involving that CRD (e.g. kubectl explain es
)
The bug only occurs on k8s 1.16 && 1.17
with kubectl > 1.16
. I couldn't reproduce it with other combinations of versions.
I don't see a way at the moment to tweak our CRDs to avoid the issue other than not using structural schema but we are already offering a non-structural version as *-legacy.yaml manifests.
We could remove type: array
for the volumeClaimTemplates
definition in the CRDs. This makes the bug disappear. The drawback is that we loose the nice validation that we have starting k8s 1.18 thanks to the structural schema.
> kubectl apply -f quickstart.yml # where I put an object in the volumeClaimTemplates instead of an array
The Elasticsearch "quickstart" is invalid: spec.nodeSets.volumeClaimTemplates: Invalid value: "object": spec.nodeSets.volumeClaimTemplates in body must be of type array: "object"
Instead we get a more obscure error:
Error from server: error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: v1.Elasticsearch.Spec: v1.ElasticsearchSpec.NodeSets: []v1.NodeSet: v1.NodeSet.VolumeClaimTemplates: []v1.PersistentVolumeClaim: decode slice: expect [ or n, but found {, error found in #10 byte of ...|mplates":{"metadata"|..., bigger context ...|count":1,"name":"default","volumeClaimTemplates":{"metadata":{"name":"elasticsearch-data"},"spec":{"|...
The above error prevents us from adding our own validation.
For now, I see 3 options:
type: array
to target only k8s 1.16 and 1.17type: array
to target k8s >= 1.16 with the drawback of loosing a nice validation for the volumeClaimTemplates
fieldThe bug only occurs on
k8s 1.16 && 1.17
withkubectl > 1.16
. I couldn't reproduce it with other combinations of versions.I don't see a way at the moment to tweak our CRDs to avoid the issue other than not using structural schema but we are already offering a non-structural version as *-legacy.yaml manifests.
We could remove
type: array
for thevolumeClaimTemplates
definition in the CRDs. This makes the bug disappear. The drawback is that we loose the nice validation that we have starting k8s 1.18 thanks to the structural schema.> kubectl apply -f quickstart.yml # where I put an object in the volumeClaimTemplates instead of an array The Elasticsearch "quickstart" is invalid: spec.nodeSets.volumeClaimTemplates: Invalid value: "object": spec.nodeSets.volumeClaimTemplates in body must be of type array: "object"
Instead we get a more obscure error:
Error from server: error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: v1.Elasticsearch.Spec: v1.ElasticsearchSpec.NodeSets: []v1.NodeSet: v1.NodeSet.VolumeClaimTemplates: []v1.PersistentVolumeClaim: decode slice: expect [ or n, but found {, error found in #10 byte of ...|mplates":{"metadata"|..., bigger context ...|count":1,"name":"default","volumeClaimTemplates":{"metadata":{"name":"elasticsearch-data"},"spec":{"|...
The above error prevents us from adding our own validation.
For now, I see 3 options:
- document the bug and the workarounds
- generate a third version of CRDs without
type: array
to target only k8s 1.16 and 1.17- generate the CRDs without
type: array
to target k8s >= 1.16 with the drawback of loosing a nice validation for thevolumeClaimTemplates
field
Good news, thanks :) I think it’s a good idea to document the bug with workaround and to remove 1.17 version from supported versions.
I hit the issue with different combinations of kube/kubectl for the record. I should note, I manage the crds with argocd so when upgrade/downgrade the chart version I simultaneously upgrade/downgrade the crds.
I don’t think we need to add x-kubernetes-preserve-unknown-fields to volumeClaimTemplates, because we sometimes need to use local storage to deploy eck for testing.
--- crds-orig.yaml 2021-08-12 15:06:03.933995197 +0800
+++ crds.yaml 2021-08-12 15:00:34.561085304 +0800
@@ -2202,7 +2202,6 @@ spec:
type: object
type: object
type: array
- x-kubernetes-preserve-unknown-fields: true
required:
- name
type: object
I don’t think we need to add x-kubernetes-preserve-unknown-fields to volumeClaimTemplates
I forgot this one, it might be a good option. This avoids the bug and also the obscure error that we can get with removing type: array
.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.14.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
volumeClaimTemplates:
metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
> k version --short
Client Version: v1.17.17
Server Version: v1.16.15
# CRDs => the bug occurs
error: SchemaError(co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates): array should have exactly one sub-item
# CRDs without type:array => obscure error
Error from server: error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: v1.Elasticsearch.Spec: v1.ElasticsearchSpec.NodeSets: []v1.NodeSet: v1.NodeSet.VolumeClaimTemplates: []v1.PersistentVolumeClaim: decode slice: expect [ or n, but found {, error found in #10 byte of ...|mplates":{"metadata"|..., bigger context ...|count":1,"name":"default","volumeClaimTemplates":{"metadata":{"name":"elasticsearch-data"},"spec":{"|...
# CRDs without x-kubernetes-preserve-unknown-fields:true
error: error validating "q.yml": error validating data: ValidationError(Elasticsearch.spec.nodeSets[0].volumeClaimTemplates): invalid type for co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates: got "map", expected "array"; if you choose to ignore these errors, turn validation off with --validate=false
> k version --short
Client Version: v1.20.7
Server Version: v1.19.9-gke.1900
# CRDs
The Elasticsearch "quickstart" is invalid: spec.nodeSets.volumeClaimTemplates: Invalid value: "object": spec.nodeSets.volumeClaimTemplates in body must be of type array: "object"
# CRDs without type:array => obscure error
Error from server: error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: v1.Elasticsearch.Spec: v1.ElasticsearchSpec.NodeSets: []v1.NodeSet: v1.NodeSet.VolumeClaimTemplates: []v1.PersistentVolumeClaim: decode slice: expect [ or n, but found {, error found in #10 byte of ...|mplates":{"metadata"|..., bigger context ...|count":1,"name":"default","volumeClaimTemplates":{"metadata":{"name":"elasticsearch-data"},"spec":{"|...
# CRDs without x-kubernetes-preserve-unknown-fields:true
error: error validating "q.yml": error validating data: ValidationError(Elasticsearch.spec.nodeSets[0].volumeClaimTemplates): invalid type for co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates: got "map", expected "array"; if you choose to ignore these errors, turn validation off with --validate=false
spex
instead of spec
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.14.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spex:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
> k version --short
Client Version: v1.17.17
Server Version: v1.16.15
# CRDs => the bug occurs
error: SchemaError(co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates): array should have exactly one sub-item
# CRDs without type:array
Error from server (Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.): error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.
# CRDs without x-kubernetes-preserve-unknown-fields:true
error: error validating "q.yml": error validating data: ValidationError(Elasticsearch.spec.nodeSets[0].volumeClaimTemplates[0]): unknown field "spex" in co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates; if you choose to ignore these errors, turn validation off with --validate=false
> k version --short
Client Version: v1.20.7
Server Version: v1.19.9-gke.1900
# CRDs
Error from server (Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.): error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.
# CRDs without type:array
Error from server (Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.): error when creating "q.yml": admission webhook "elastic-es-validation-v1.k8s.elastic.co" denied the request: Elasticsearch.elasticsearch.k8s.elastic.co "quickstart" is invalid: spex: Invalid value: "spex": spex field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest
# CRDs without x-kubernetes-preserve-unknown-fields:true
error: error validating "q.yml": error validating data: ValidationError(Elasticsearch.spec.nodeSets[0].volumeClaimTemplates[0]): unknown field "spex" in co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates; if you choose to ignore these errors, turn validation off with --validate=false
I did not see this issue on Kubernetes 1.21
# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.11", GitCommit:"ea5f00d93211b7c80247bf607cfa422ad6fb5347", GitTreeState:"clean", BuildDate:"2020-08-13T15:20:25Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-07-12T20:40:20Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
But on Kubernetes 1.19, it can be reproduced.
We are still in the process of evaluating a fix and a workaround. A good candidate would be to remove the x-kubernetes-preserve-unknown-fields
field from the volumeClaimTemplates
node.
From a user perspective it would mean:
invalid type for co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates: got "map", expected "array"
See also @thbkrkr's comment here.volumeClaimTemplates
node would be rejected on the client side, by kubectl
, while it would be rejected by the validation webhook today:
error: error validating "config/samples/elasticsearch/elasticsearch-unknown-field.yaml": error validating data: ValidationError(Elasticsearch.spec.nodeSets[0].volumeClaimTemplates[0]): unknown field "foo" in co.elastic.k8s.elasticsearch.v1.Elasticsearch.spec.nodeSets.volumeClaimTemplates; if you choose to ignore these errors, turn validation off with --validate=false
One caveat is that once the CRD has been installed it does not seem possible to upgrade anymore, either with Helm or by using kubectl replace
.
A workaround is to patch manually the CRD with the following command before proceeding with the upgrade from 1.7.0
:
kubectl patch crd elasticsearches.elasticsearch.k8s.elastic.co --type json -p='[{"op": "remove", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/nodeSets/items/properties/volumeClaimTemplates/x-kubernetes-preserve-unknown-fields"}]'
@morningspace @travisghansen The fix for K8S 1.19 is actually available since 1.19.5
. Could you provide the exact version of K8S you're using (using kubectl version
) please ?
Thanks !
Interesting! thanks for doing the research on that front. I can try updating to a newer version of 1.19 and let you know how it goes.
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:09:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
That seems to have done the trick. Let me know if you want me to run any other tests before I go ahead and move the clusters to 1.20 and 1.21 shortly after that.
As an aside, would it be possible to remove the empty properties: {}
blocks from the CRDs? ArgoCD detects it as a diff as the cluster apparently wipes the fields out completely.
As an aside, would it be possible to remove the empty
properties: {}
blocks from the CRDs? ArgoCD detects it as a diff as the cluster apparently wipes the fields out completely.
Unfortunately we have to insert those blocks in the Elasticsearch CRD for backward compatibility, see https://github.com/elastic/cloud-on-k8s/pull/4679
The installation of ECK
1.7.0
on Kubernetes 1.16 and 1.17 using Helm or the YAML manifests returns this error:To reproduce:
Workarounds:
--validate=false
flag:This seems to be a client side issue during validation as
--validate=false
is a remedy.