EnMasseProject / enmasse

EnMasse - Self-service messaging on Kubernetes and OpenShift
https://enmasseproject.github.io
Apache License 2.0
190 stars 87 forks source link

Move to v1 of CRD API #5247

Closed k-wall closed 3 years ago

k-wall commented 3 years ago

Description

We need move to v1 of the CRD API for bundle and OLM installs. Unfortunately we need to retain support for OpenShift 3.11, so we actually need to support both for the bundle install (probably just different install bundles).

High level description of the task.

Design

CRD Schema To move to the CRD v1 schema, the schema needs to be structural. Our v1beta1 schemas are largely structure already and need only a few x-kubernetes-int-or-string declarations in fields such as maxUnavailable to make them comply.

We have a choice to make about whether our schemas should preserve unknown fields.

EnMasse currently relies on the preseve unknown behaviour for infraconfig fields such as podTemplates and addressspace field networkconfigs. Our CRDs don't currently fully specify the contents. If we took the recommended for CRD v1, ETCD would throw the content away. We have options:

  1. make the requirement for preserveUnknownFields explicit in the CRD. ETCD will have store all content. The disadvantage is that the tooling can't help the user avoid typos and incorrectly indented blocks.
  2. annotate the fields that require unknown fields with x-kubernetes-preserve-unknown-fields: true and set the schema as a whole preserveUnknownFields: false
  3. elaborate the schemas to fully describe all supported fields.

Option 3 creates some extra work but with the payback: fewer user mistakes.

Generating the bundles

  1. The template tree will be restructured to have a crds/v1 and crd/v1beta1 subfolders.
  2. The templates bundle (templates/Makefile) will be refactored to build two separate subtrees, at templates/build/default and templates/build/prekube1_22 with the latter providing v1beta1 schemas, and two separate tgz. enmasse-.tgz will continue to contain the defaiult build. A new enmasse-prekube1_22-.tgz will contain the bundle with the v1beta1 schemas.
  3. The bundle/ansible installation instructions will remain unchanged.
  4. The Jenkins/Github integration and support script will need refactored.
    • when running on OCP 3.11 use prekube1_22
    • otherwise use default
  5. For OLM we only wish to support CRD v1, so olm-manifest/src/assembly/unix-dist.xml needs refactored to be aware of the new location.

If applicable, links to design docs.

Tasklist

Enumerate the sub-tasks required to complete the task:

vbusch commented 3 years ago

I believe option #1 for preserving unknown field behaviour could lead to an increase of support issues regarding typos etc. Option #3 is the best long term solution, where the the schemas fully describes all supported fields.
But if this is the last release, then I propose we go with option #2 and set preserve-unknown-fields on the field level where required.

k-wall commented 3 years ago

@vbusch i don't see 1 actually making things worse than today - it just allows the same class of error as a user could make with 0.33. 1 makes me feel slightly uncomfortable as I don't really feel like Im doing v1 in the sprint it intends. 3 is slightly tricky as it means we have to examine the controller code and make sure the schema allows what is supported. At the moment, I intend to head 3 and fall back on 2 otherwise.

k-wall commented 3 years ago

I realised that option 1 isn't actually possible. Is seems that loading a CRD with spec.preserveUnknownFields: true is not supported. You have to annotate your schema with set x-preserve-unknown-fields.

Error from server (Invalid): error when creating "templates/build/default/enmasse-latest/install/bundles/enmasse/010-addresses.crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "addresses.enmasse.io" is invalid: spec.preserveUnknownFields: Invalid value: true: cannot set to true, set x-preserve-unknown-fields to true in spec.versions[*].schema instead

On reflection option 3 was distasteful. It would have meant hardcoding swaths of foreign schema for object such as networkpolicy securitycontext etc. In the end I opted for 2.