Closed hellt closed 2 days ago
Hey @hellt :wave:
The GatewayClass
is part of the Cilium Helm Chart (and not created by the Cilium Operator).
The GatewayClass
only gets installed if Gateway API is enabled and the CRD with the expected API version is installed on the cluster.
It uses .Capabilities.APIVersions.Has
where Helm makes a request to the cluster during an installation (helm install
).
When using helm template
one has to manually pass the additional supported CRDs. It's also recommended to do the same with the k8s version.
https://helm.sh/docs/helm/helm_template/
--api-versions strings Kubernetes api versions used for Capabilities.APIVersions
--kube-version string Kubernetes version used for Capabilities.KubeVersion
Your issue should be solved by adding --api-versions='gateway.networking.k8s.io/v1/GatewayClass'
to your Helm templating command.
helm template \
cilium \
cilium/cilium \
--version 1.16.0-pre.3 \
--namespace kube-system \
--api-versions='gateway.networking.k8s.io/v1/GatewayClass' \
-f cilium-values.yml \
> talos/cilium-manifest.yml
oh wow, thanks @mhofstetter! adding this flag immediately solved the issue. Since inline manifests are the recommended way to install Cilium CNI according to Talos docs I stepped into this trap not knowing that helm checks the presence of the certain CRDs
Thank you very much for providing this insight, it reolved my issue immediately.
PS. would be nice to have this in the docs somewhere (I can volunteer with the patch if that is how the docs team work)
I think this could be a problem in environments where you can't influence the order of helm chart installation like CAPI helm addon (CAAPH). In such a scenario, you just upload bunch of helmchartproxy
CRs to the cluster (containing chart parameters like repo, name and values). One of the CRs installs Cilium and the other installs GatewatAPI.
Than you instantiate a cluster - CAPI provisions some nodes and installs kubernetes inside. As soon as the new cluster's API is accessible, CAAPH installs all the charts in undefined order. This way, you end up either with cluster with class or without class. Such scenarios need a way to force creation of the class.
yes, I do agree with @balous My expectation was that the cilium controller would apply the GW API manifests on its own (as well as it knows what GW API version is supported by the controller). This seems to be a logical way of handling that.
I have no problem in packaging the GW API CRDs separately, but need consistent behavior from the cilium chart.
CAAPH does normal reconcile, if the chart installation fails, it is retried. I have a similar example with kubernetes-prometheus-stack which installs CRDs that are used by cilium too. It works, you just see some errors in CAAPH logs as Cilium installation has to be retried.
For Gateway API, the problem is that this is an upstream set of CRDs, that multiple implementations may want to install - and each of those implementations may want to install different versions. So, Cilium may want to install v1.1.0, but if you install Envoy Gateway, it may want to install v1.0.0.
Because CRDs can only have one version, if both implementations had controllers install their version, you would either end up with inconsistent behavior (if they run at startup of the controller, with the active version depending on which implementation's controller has restarted more recently), or flapping CRD versions (if they both actively reconcile the version, and put their version back when the CRD changes).
There's really no great solution for this problem at the moment, which stems from the fact that you can only have one version of a CRD installed in the cluster. We've talked about a number of options upstream in Gateway API, but haven't been able to find anything that works for everyone yet.
That's why, for now, Cilium has the requirement that the CRDs are "manually" installed - which really just means that the CRDs need to be installed before Cilium is.
Sorry I can't give you better news - if it was as simple as just having the Cilium operator apply the manifests, I'd happily do that.
Why would then gw SIG put this option on the first place? https://gateway-api.sigs.k8s.io/guides/#installing-a-gateway-controller
@youngnick, I understand this, as I wrote, I don't have problem with installing CRDs separately. My problem is the autodetection inside Cililum chart which leads to uncertain results. I'd prefer Cilium installation to fail in case GW API CRDs are not installed (the same way it fails when Prometheus CRDs are not installed) instead of skipping the GatewayClass.
Okay, that seems like a reasonable ask - I agree that having the helm chart fail if Gateway API is enabled and GatewayClass is not present probably makes more sense. It is a breaking change though, so we will need to be a little careful about it.
@hellt Note the language there - some implementations may install automatically - that just means that they are making the call that they won't be affected too much by any of the issues I mentioned.
Okay, that seems like a reasonable ask - I agree that having the helm chart fail if Gateway API is enabled and GatewayClass is not present probably makes more sense. It is a breaking change though, so we will need to be a little careful about it.
I could create a PR adding a new option for this - gatewayAPI.gwClass.create: <true|false|auto>
where auto
is the current behavior. Seams reasonable?
Yes, that seems very reasonable, thanks!
Is there an existing issue for this?
What happened?
I have installed the v1.0.0 Gateway API CRDs and after that performed an install of Cilium using the following:
the inputs file is:
Then I proceed with installing the gateway:
But then
cilium-operator
complains that nocilium
gateway class is available:I wonder, how do I identify why the operator did not create the gateway class?
Cilium Version
1.16.0-pre3
Kernel Version
6.6.32-talos
Kubernetes Version
1.30.1
Regression
No response
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Cilium Users Document
Code of Conduct