Closed rainest closed 3 years ago
MicroK8S CI also seems to be broken at the cluster setup step; need to figure out what's broken there.
The first option (new CRD) appears to be what others have done (see https://github.com/Kong/kong-operator/issues/65#issuecomment-940211280). I'm still unsure on the version since a 1.x release would sorta imply more production readiness, and we've technically never moved this out of the alpha phase (though in practice it's a wrapper around the chart, and we do consider the chart production-ready).
The changelog has a draft of a migration strategy. I need to test that this actually works.
Tests are now failing because Ingresses aren't getting status updates, which I can replicate locally. Not sure why, since there are no controller errors. Maybe there's a behavior change where we previously set ClusterIP proxy Service IPs as Ingress load balancer status IPs, but don't any longer in 2.0?
Edit:
Apparently sort of. On 2.x:
status:
loadBalancer: {}
On 1.x:
status:
loadBalancer:
ingress:
- {}
The latter passes the regex check because the ingress
object is set, albeit to nothing. Not really sure which is more correct--neither really indicates anything of interest.
Reworking the test to use curl retries did not work. We'd need --retry-all-errors
to make retry take action on 404s.
https://curl.se/docs/manpage.html#--retry-all-errors is not available on the version of curl in Ubuntu 20.04: https://packages.ubuntu.com/search?keywords=curl
Would probably need to either use https://github.com/marketplace/actions/retry-step, rework how 2.x does no-IP status to match the 1.x behavior, or rework the curl tests to work with wait_for.
quasi-related PR for something I found during testing, we should include it in the release: https://github.com/Kong/kong-operator/pull/68
Manual testing of the upgrade steps is documented below:
I wasn't able to test a complete OLM install of 0.9.0 with the SDK version we use, so that's simulated to confirm that the resulting Kong CRs result in what we expect. That shouldn't be an issue since the image has no problem reading the current CRD API (else it would presumably be unable to act on the CRs) and because we don't upload any OLM content other than the YAML CSV and CRD--anything OLM machinery parsing is presumably handled by the actual OLM install, i.e. OperatorHub.io in the real world, and it is up to date.
I briefly looked at migrating to 1.0, but it's involved enough that it'd take a while to verify and would make the diff look like garbage, so in the interest of time and PR clarity, I think we should defer that.
0.9.0 updates the chart to 2.4.0. This includes the KIC 2.0 release and CRD/admission/etc. API version updates for Kubernetes 1.22 compatibility.
It updates the operator-specific Kong CRD to apiextensions.k8s.io/v1. It still lacks a structural schema (it'd need one for the entirety of values.yaml, which would be quite an undertaking), so the entire spec has
x-kubernetes-preserve-unknown-fields: true
. I used the same strategy as we used for the KIC CRDs to upgrade it: I uploaded it to a 1.21 cluster, let Kubernetes auto-upgrade it, pulled down the upgraded version, and stripped instance-specific metadata. AFAIK we've never generated this automatically from anything beyond the initial stub version created by operator-sdk. We did use an older version of the SDK to create it, though a version created with something more recent isn't hugely different (mostly it just breaks out some metadata in the schema--the main spec part remains empty and just preserves unknown fields).This remains a draft because the Kong CRD is still not fully 1.22 compatible. We originally used
charts.helm.k8s.io
for the CRD group, and thek8s.io
suffix now has special requirements. Attempting to apply it as-is will result in this error:We presumably have no need for an approved API hostname, so it makes sense to move that elsewhere. There isn't a whole lot of guidance about how to choose a group (Operator SDK docs and Kubernetes docs just use
example.com
without additional guidance, another operator I reviewed uses the operatorhub.io domain, so I guess we probably use eithercharts.configuration.konghq.com
orcharts.konghq.com
.Changing the group does mean that prior version CRs would not automatically upgrade. Not sure if we want to automate this or just configure the metadata and write changelog instructions to indicate that you must manually create a new release using the updated CR and copy in your spec.
Lastly, similar to https://github.com/Kong/kong-operator/pull/64, we lack automation to push the image, and will need to build and push that manually.