Kong / kong-operator

Kong Operator for Kubernetes and OpenShift
https://konghq.com
Apache License 2.0
58 stars 27 forks source link

Release v0.9.0 #66

Closed rainest closed 3 years ago

rainest commented 3 years ago

0.9.0 updates the chart to 2.4.0. This includes the KIC 2.0 release and CRD/admission/etc. API version updates for Kubernetes 1.22 compatibility.

It updates the operator-specific Kong CRD to apiextensions.k8s.io/v1. It still lacks a structural schema (it'd need one for the entirety of values.yaml, which would be quite an undertaking), so the entire spec has x-kubernetes-preserve-unknown-fields: true. I used the same strategy as we used for the KIC CRDs to upgrade it: I uploaded it to a 1.21 cluster, let Kubernetes auto-upgrade it, pulled down the upgraded version, and stripped instance-specific metadata. AFAIK we've never generated this automatically from anything beyond the initial stub version created by operator-sdk. We did use an older version of the SDK to create it, though a version created with something more recent isn't hugely different (mostly it just breaks out some metadata in the schema--the main spec part remains empty and just preserves unknown fields).

This remains a draft because the Kong CRD is still not fully 1.22 compatible. We originally used charts.helm.k8s.io for the CRD group, and the k8s.io suffix now has special requirements. Attempting to apply it as-is will result in this error:

$ kubectl create -f /tmp/crd.yaml 
The CustomResourceDefinition "kongs.charts.helm.k8s.io" is invalid: 
* metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation "api-approved.kubernetes.io", see https://github.com/kubernetes/enhancements/pull/1111

We presumably have no need for an approved API hostname, so it makes sense to move that elsewhere. There isn't a whole lot of guidance about how to choose a group (Operator SDK docs and Kubernetes docs just use example.com without additional guidance, another operator I reviewed uses the operatorhub.io domain, so I guess we probably use either charts.configuration.konghq.com or charts.konghq.com.

Changing the group does mean that prior version CRs would not automatically upgrade. Not sure if we want to automate this or just configure the metadata and write changelog instructions to indicate that you must manually create a new release using the updated CR and copy in your spec.

Lastly, similar to https://github.com/Kong/kong-operator/pull/64, we lack automation to push the image, and will need to build and push that manually.

rainest commented 3 years ago

MicroK8S CI also seems to be broken at the cluster setup step; need to figure out what's broken there.

rainest commented 3 years ago

The first option (new CRD) appears to be what others have done (see https://github.com/Kong/kong-operator/issues/65#issuecomment-940211280). I'm still unsure on the version since a 1.x release would sorta imply more production readiness, and we've technically never moved this out of the alpha phase (though in practice it's a wrapper around the chart, and we do consider the chart production-ready).

The changelog has a draft of a migration strategy. I need to test that this actually works.

Tests are now failing because Ingresses aren't getting status updates, which I can replicate locally. Not sure why, since there are no controller errors. Maybe there's a behavior change where we previously set ClusterIP proxy Service IPs as Ingress load balancer status IPs, but don't any longer in 2.0?

Edit:

Apparently sort of. On 2.x:

status:
  loadBalancer: {}

On 1.x:

status:
  loadBalancer:
    ingress:
    - {}

The latter passes the regex check because the ingress object is set, albeit to nothing. Not really sure which is more correct--neither really indicates anything of interest.

rainest commented 3 years ago

Reworking the test to use curl retries did not work. We'd need --retry-all-errors to make retry take action on 404s.

https://curl.se/docs/manpage.html#--retry-all-errors is not available on the version of curl in Ubuntu 20.04: https://packages.ubuntu.com/search?keywords=curl

Would probably need to either use https://github.com/marketplace/actions/retry-step, rework how 2.x does no-IP status to match the 1.x behavior, or rework the curl tests to work with wait_for.

shaneutt commented 3 years ago

quasi-related PR for something I found during testing, we should include it in the release: https://github.com/Kong/kong-operator/pull/68

rainest commented 3 years ago

Manual testing of the upgrade steps is documented below:

log.txt opertest.tar.gz

I wasn't able to test a complete OLM install of 0.9.0 with the SDK version we use, so that's simulated to confirm that the resulting Kong CRs result in what we expect. That shouldn't be an issue since the image has no problem reading the current CRD API (else it would presumably be unable to act on the CRs) and because we don't upload any OLM content other than the YAML CSV and CRD--anything OLM machinery parsing is presumably handled by the actual OLM install, i.e. OperatorHub.io in the real world, and it is up to date.

I briefly looked at migrating to 1.0, but it's involved enough that it'd take a while to verify and would make the diff look like garbage, so in the interest of time and PR clarity, I think we should defer that.