Closed megian closed 3 months ago
Thanks for reporting this, together with useful references. I'll analyze this and get back to you.
An effect out of it is that cert-manager is unable to issue certificates, because the challenge URL is checked ahead, but this verification step fails and prevent issuing the certificate, even if the challenge is outside accessible to Let's encrypt.
Okay I see the problem with the current implementation. I would probably fix this via an annotation, like Digital Ocean did, together with a similar disclaimer:
The only thing is that you would have to provide a domain name that points at the right IP address iiuc. It's worth mentioning that we have a DNS entry for each customer IPv4 address, which could be used (e.g. 1-2-3-4.cust.cloudscale.ch). If you want to automate that on your end, you likely could do it that way (if you do not have a domain ready in all cases).
Unless I discover something I'm not seeing I would therefore add an annotation for you to trigger the workaround-behavior. Does that sound reasonable? Do you have a way to work around this issue at the moment for your cluster? How high of a priority is this for you?
@href The approach to set the hostname is fine. I think it can be any arbitrary name. In difference to the IP the hostname shouldn't be used. As we currently have no clusters on v1.29 spending time right now for a final solution via .status.loadBalancer.ingress.ipMode
wouldn't help right now.
Currently it blocks the setup of Keycloak, which we require for more depending services. As a short term solution we might remove the proxy protocol, which then removes the capability to have access to the source IP. I think we can work some days, but not weeks without.
Got it, I'll try to tackle the problem this or next week. I'll get back to you once I have something testable. I assume you can run a test-build against a cluster to try it out, once I have the feature ready.
I assume you can run a test-build against a cluster to try it out, once I have the feature ready.
This should be duable.
After a bit of a battle with GitHub CI this feature is ready for testing. I created a preview release that you can use on your cluster: https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/releases/tag/1.1.0-rc.1
For your pre 1.30 clusters, you can now set the following annotation to a hostname that points at the load balancer:
k8s.cloudscale.ch/loadbalancer-force-hostname
This in turn cause status.loadBalancer.ingress[0].hostname
to be set to the annotation value (no other ingress items will be added).
After 1.30, this should not be needed, as the new Proxy
mode is going to be the default. If you require the old VIP
mode, you would have to enforce that. See https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/commit/e3b86129a218fb1c368b2fa0ffcdebe43e3d222c
@megian Can you try this out on your end and get back to me?
@href Thanks for the fast update. Will try it beginning of next week!
@href From my point of view this works as expected.
After adding the annotation k8s.cloudscale.ch/loadbalancer-force-hostname
it no longer exposes the IP in the status:
$ kubectl -n openshift-ingress get svc router-public-lb -o yaml | yq .status
loadBalancer:
ingress:
- hostname: ingress-public.example.ch
Cilium doesn't have an cluster internal service endpoint anymore (which it had before):
$ k --as=cluster-admin -n cilium exec cilium-jtnt7 -- cilium service list | grep -A1 x.x.x.x
All the connections to the proxy protocol enabled OpenShift ingress are working now. Many thanks!
Not sure on the new Proxy
mode is available on Kubernetes 1.30 by default, because it's still Beta and Kubernetes 1.24, new beta APIs are not enabled by default..
Thanks for your feedback. @alakae is doing a code review before we make an official release, but we think we should be able to release tomorrow or Wednesday.
Not sure on the new
Proxy
mode is available on Kubernetes 1.30 by default, because it's still Beta and Kubernetes 1.24, new beta APIs are not enabled by default..
According to the release notes it is: https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-release/#make-kubernetes-aware-of-the-loadbalancer-behaviour-sig-network-https-github-com-kubernetes-community-tree-master-sig-network
Also, in our integration tests, where we ran Vanilla 1.30.4, the IPMode
setting was successfully tested:
https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/actions/runs/10522695378/job/29155932836
Testing-bugs not-withstanding I think with 1.30 this should just work.
@href Good to hear, that it should work on Kubernets v1.30 by default. Can't proove it as OpenShift on Kubernetes v1.29 as latest yet.
We have released 1.1.0: https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/releases/tag/1.1.0
Let me know if this solves the problem for you, so we can close this ticket.
@href Many thanks. Upgraded to the v1.1.0 and it seems to work as expected.
Nice, thanks for confirming!
The cloudscale cloud controller does set the IP in the service object status
.status.loadBalancer.ingress.ip
. This causes the Kubernetes cluster is routing the traffic internally. Which is a positive behavior, as it is faster and comes along with less overhead.However this effect causes big headache, if the final system expects something the load balancer adds in between. In this case the proxy protocol. Internal traffic sent to the ingress controller is just invalid, because it will not be encapsulated by the proxy protocol.
This seems to be a long standing issue with Kubernetes. As soon as the IP is known by Kubernetes the internal path get's enabled. A solution is planned, but it will take time until this is stable and available in the production environments.
There is a workaround for example AWS has implemented. Just not set the IP but the hostname.