Open NissesSenap opened 1 year ago
One of our major blockers right now is that we cant get node-local-dns to work without running cilium kubeProxyReplacement: strict
.
That means that we can't run cilium together with kube-proxy.
We have written a null_resource that deletes kube-proxy but Azure is "kind enough" to install it for us again.
There is a feature that currently is in preview https://learn.microsoft.com/en-us/azure/aks/configure-kube-proxy where we can disable kube-proxy all together. Hopefully this will become GA soon.
We are also waiting for the terraform provider to support configuring kube-proxy. https://github.com/hashicorp/terraform-provider-azurerm/pull/19567
On the other hand, we have verified that linkerd is working as intended on-top of cilium.
If we would like to enable a preview feature we could probably do it by using: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/resource_provider_registration
This feature is very low risk since we only disable the usage of kube-proxy in our cluster.
Moving to blocked due to: https://github.com/cilium/cilium/issues/22838
Information regarding node-local-dns:
Initially we had missed that in order for node-local-dns to work, we need set up a Local Redirect Policy for cilium to be able to route DNS traffic to it. There is a description here on how to do it: https://cloud.yandex.com/en/docs/managed-kubernetes/operations/cilium-node-local-dns
In order to enable local redirect in cilium we have to run cilium with kubeProxyReplacement=strict
which means that you run Cilium without kube-proxy
We found one problem in AWS related to running without kube-proxy. The ingress-nginx deployment is using hostNetwork: true
and we have not been get that working, details can be found here: https://github.com/cilium/cilium/issues/22838
We have experimented with not using hostNetwork but then get problem with that the K8S API Server cannot reach the webhooks, e.g, we get problem like this:
error: ingresses.networking.k8s.io "podinfo" could not be patched: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-public-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": Address is not allowed
With hostNetwork: true
the ingress-nginx pods get the ip address of the node and without that it does not seem to work.
We have made some experiments without host network to check the behaviour:
Both ways made it possible for the API server to reach the Webhook endpoints but we ran into cert problems due to URL mismatch in both cases as expected
Possible ways forward:
Implement Cilium in Azure and AWS
Tasks
Work is ongoing in https://github.com/XenitAB/terraform-modules/pull/798