Kong ingress-controller restart all the time when we upgrade the Kong version directly

yaoyao12138 commented 2 years ago

When we upgrade the Kong from an old verison 0.8 to the new version 2.5 directly in the env, the kong ingress-controller container always restart due to this crd udpingress lacked. Except applied this crd mannually, is there any other suggestion to fix it in Kong code? Thank you.

➜  ~ kc get po -A|grep kong
katamari                                           gateway-kong-544f8b74bc-hq2pg                                       1/2     CrashLoopBackOff   7          30m
katamari                                           ibm-kong-operator-65c675b488-gzbm2                                  1/1     Running            0          31m
➜  ~ kc logs gateway-kong-544f8b74bc-hq2pg -c ingress-controller
...
I0222 09:27:56.085980       1 request.go:665] Waited for 11.189621419s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/k8s.cni.cncf.io/v1?timeout=32s
time="2022-02-22T09:28:01Z" level=error msg="if kind is a CRD, it should be installed before calling Start" error="no matches for kind \"UDPIngress\" in version \"configuration.konghq.com/v1beta1\"" kind="{\"
...

shaneutt commented 2 years ago

In general across most Kubernetes projects (outside of Kong) CRD management has been a pain point and the idea of automatic CRD management is something that we've not seen much success in elsewhere. While I understand the desire we don't have a solution like this today and we're more or less waiting on some standard upstream Kubernetes solution (if ever there will be one) for such a problem. Do you have some other operator or tool which provides an example of the automatic CRD management you're looking for?

The relevant documentation for CRD management is https://github.com/Kong/kong-operator/tree/main/helm-charts/kong#crd-management. For the moment you'll want to manually update your CRDs for new changes, and pay attention to release notes to manually add new CRDs that come out in new versions.

morningspace commented 2 years ago

Thanks @shaneutt !

Do you have some other operator or tool which provides an example of the automatic CRD management you're looking for?

We are now looking at OLM, which seems to be promising to handle CRDs management automatically, but need further verification.

For the moment you'll want to manually update your CRDs for new changes, and pay attention to release notes to manually add new CRDs that come out in new versions.

Are there any more specific instructions on when and how to do it manually? Maybe some sample snippets would be much better. The doc: https://github.com/Kong/kong-operator/tree/main/helm-charts/kong#crd-management seems too generic.

shaneutt commented 2 years ago

Are there any more specific instructions on when and how to do it manually? Maybe some sample snippets would be much better. The doc: https://github.com/Kong/kong-operator/tree/main/helm-charts/kong#crd-management seems too generic.

The CRDs that are relevant at any given moment are available here (note that this is specifically for v0.9.0, you'll want to change this in the future for new versions):

https://github.com/Kong/kong-operator/blob/v0.9.0/helm-charts/kong/crds/custom-resource-definitions.yaml

So the effective instructions are:

read the release notes for the version you're upgrading to, follow any release specific instructions for CRDs
assuming there's nothing special, run the following prior to upgrade (switching the specific version): kubectl apply -f https://github.com/Kong/kong-operator/blob/v0.9.0/helm-charts/kong/crds/custom-resource-definitions.yaml

You'll see in the above that UDPIngress is present in those manifests.

The vast majority of the time CRDs will be able to upgrade cleanly as we avoid backwards incompatible changes, and in fact we change them as little as possible because CRD management requires a delicate hand. If there are any backwards incompatible changes they will be made clear in release notes, and if there are new versions of the CRDs migrations logic will be handled automatically by webhooks in the ingress controller.

morningspace commented 2 years ago

That's awesome! So, @shaneutt, any idea on instructions in case things failed, do we need backup/restore steps in the above instructions?

shaneutt commented 2 years ago

There are two types of relevant failure modes:

a failure to upgrade the CRD resource
a failure of ingress controller functionality after the CRD has successfully upgraded

The first failure mode would result in the CRD upgrade actually being refused by the Kubernetes API so this would be more that you would be "stuck" at your current version, restoration would not be needed (but the issue should be reported).

The second failure mode would mean that some functionality that worked in the previous version of a CRD and ingress controller stops working in the new versions: your options may include either restoring from a backup using something like Velero which you suggested, or pursuing and implementing a mitigation at your current version and using that temporarily in anticipation for an upcoming patch (in both cases, reporting the issue).

For both of these cases I have no relevant historical issues to reference as we have been successful in treating these upgrade paths with extremely high criticality and testing them extensively. That said, mistakes are a reality of life so we generally recommend the following:

creating a staging or testing cluster and testing upgrades and new functionality in that cluster prior to production deployments
take backups of your staging and production environment (particularly before upgrades)

These aren't really particular to Kong however, they're more like best practices for any production workload on Kubernetes.

While we're on the topic of production and the possibility of upgrade issues I do feel it's important to point out that we do not currently consider this operator in it's v0.x release cycle as an officially supported deployment method for Kong: in our installers documentation the Helm chart (directly) is the currently recommended production deployment method.

yaoyao12138 commented 2 years ago

Thanks @shaneutt @morningspace very much. No more questions, closed this.

Kong / kong-operator

Kong ingress-controller restart all the time when we upgrade the Kong version directly #74