Closed ghorofamike closed 4 years ago
hey @ghorofamike
You don't need any labels: any Service
with type: LoadBalancer
that gets an IP that hcloud-ip-floater
can find in your hcloud, should work.
Note that - as described in the README - hcloud-ip-floater
doesn't deal with the IP-assignment (only attachment). For that you need something like metallb
.
I was able to get it to run, and yes im aware of metallb's necessity, and have used metallb with BGP before at a different provider. Now i met a different issue, if you have a moment, sometimes the floating ip does not get bound to a node until i kill this service's container. any idea where i should look?
:thinking: this could be a bug. We may be missing some k8s events about the service or its pods. Care to share some logs of the behavior? It would be nice if you could include either the target service's or hcloud-ip-floater
|s startup in the output. This way we can narrow the problem down a bit.
i was looking around, and i saw a config parameter sync-interval
here, its by default 6 min, maybe its the reason? im not sure i waited the entire 6 min, i may have waited for less, confirm for me that it should have noticed at the event, rather than every 6 min then we can start to look around the logs.
The 5m sync interval should only be a fallback in the case of k8s: it exists to cover the case of missing events, but should only be needed in the case of network hiccups, etc (under normal conditions, the informer should receive the events as they happen). As for the hcloud side, there's no "event" interface, so polling is the only option. But as I understand your problem, it's not related to new IPs being added to hcloud, right?
No, its not new addresses, but rather no assignment of an ip to a node being done after metallb assigns one to a service, here is a sample log from metallb:
{"caller":"main.go:75","event":"noChange","msg":"service converged, no change","service":"default/internalsmtp","ts":"2020-06-15T07:56:34.081442387Z"}
{"caller":"main.go:76","event":"endUpdate","msg":"end of service update","service":"default/internalsmtp","ts":"2020-06-15T07:56:34.081489776Z"}
{"caller":"main.go:49","event":"startUpdate","msg":"start of service update","service":"default/wildduck","ts":"2020-06-15T07:56:34.107356371Z"}
{"caller":"service.go:114","event":"ipAllocated","ip":"116.202.182.160","msg":"IP address assigned by controller","service":"default/wildduck","ts":"2020-06-15T07:56:34.107486173Z"}
{"caller":"main.go:96","event":"serviceUpdated","msg":"updated service object","service":"default/wildduck","ts":"2020-06-15T07:56:34.123257367Z"}
{"caller":"main.go:98","event":"endUpdate","msg":"end of service update","service":"default/wildduck","ts":"2020-06-15T07:56:34.123309676Z"}
{"caller":"main.go:49","event":"startUpdate","msg":"start of service update","service":"default/mongo","ts":"2020-06-15T07:56:34.126760015Z"}
{"caller":"service.go:33","event":"clearAssignment","msg":"not a LoadBalancer","reason":"notLoadBalancer","service":"default/mongo","ts":"2020-06-15T07:56:34.126818653Z"}
{"caller":"main.go:75","event":"noChange","msg":"service converged, no change","service":"default/mongo","ts":"2020-06-15T07:56:34.127040167Z"}
{"caller":"main.go:76","event":"endUpdate","msg":"end of service update","service":"default/mongo","ts":"2020-06-15T07:56:34.127070494Z"}
{"caller":"main.go:49","event":"startUpdate","msg":"start of service update","service":"default/wildduck","ts":"2020-06-15T07:56:34.127094279Z"}
{"caller":"main.go:75","event":"noChange","msg":"service converged, no change","service":"default/wildduck","ts":"2020-06-15T07:56:34.127182112Z"}
{"caller":"main.go:76","event":"endUpdate","msg":"end of service update","service":"default/wildduck","ts":"2020-06-15T07:56:34.12720232Z"}
metallb assigned an ip, but no reaction from this controller, and zero logs at HCLOUD_IP_FLOATER_LOG_LEVEL: debug
level, so im unable to provide any output
from it.
here is the kubectl describe of the service:
...
NodePort: pop 32645/TCP
Endpoints: 172.20.0.17:587
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 30534
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 6m42s metallb-controller Assigned IP "XXX.XXX.XXX.XXX"
Normal nodeAssigned 4m22s metallb-speaker announcing from node "XXXXX-86-fra1d64554xmzy"
and curiously even after 6 min, im not seeing any assignment of the floating ip to a node. however this is not all, i have another ip already assigned earlier successfully, so this second ip is having issues, and it happens maybe 15-30 minutes after the other first ip worked flawlessly.
so the situation goes as follows, i have 2 IP's sitting in hcloud, i assign the first on installing this controller and metallb, and in 30-50 min i assign the other ip and it remains stuck.
I think I managed to reproduce that. It seems to get stuck inside Reconcile loop, most likely inside https://github.com/costela/hcloud-ip-floater/blob/master/internal/fipcontroller/fipcontroller.go#L211
I have no idea why it gets stuck there tho, maybe some network issues on Hetzner's side? I'll try to debug it some time later.
Either way it probably would be a good idea to add some timeouts there: replace context.Background()
.
time="2020-06-15T14:14:57Z" level=info msg="starting hcloud IP floater" version=v0.1.5-rc.2-11-g18dbc13
time="2020-06-15T14:14:57Z" level=info msg="new service" namespace=projectcontour service=envoy
time="2020-06-15T14:14:57Z" level=info msg="adding pod informer" namespace=projectcontour service=envoy
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager
time="2020-06-15T14:14:57Z" level=info msg="new service" namespace=default service=kuard-service2
time="2020-06-15T14:14:57Z" level=info msg="adding pod informer" namespace=default service=kuard-service2
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=metrics-server
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-controller-metrics
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-node-metrics
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=projectcontour service=contour
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kuard-test service=kuard
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=default service=kubernetes
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=kube-dns
time="2020-06-15T14:14:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager-webhook
time="2020-06-15T14:14:57Z" level=info msg="new service" namespace=default service=kuard-service
time="2020-06-15T14:14:57Z" level=info msg="adding pod informer" namespace=default service=kuard-service
time="2020-06-15T14:14:58Z" level=info msg="starting reconciliation" component=fipcontroller
time="2020-06-15T14:14:58Z" level=info msg="floating IP already attached" component=fipcontroller fip=xxxx node=kube-worker0
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=metrics-server
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-controller-metrics
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-node-metrics
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=projectcontour service=contour
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=kube-dns
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager-webhook
time="2020-06-15T14:19:57Z" level=info msg="service update" namespace=projectcontour service=envoy
time="2020-06-15T14:19:57Z" level=info msg="service unchanged" namespace=projectcontour service=envoy
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager
time="2020-06-15T14:19:57Z" level=info msg="service update" namespace=default service=kuard-service
time="2020-06-15T14:19:57Z" level=info msg="service unchanged" namespace=default service=kuard-service
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=default service=kubernetes
time="2020-06-15T14:19:57Z" level=info msg="service update" namespace=default service=kuard-service2
time="2020-06-15T14:19:57Z" level=info msg="service unchanged" namespace=default service=kuard-service2
time="2020-06-15T14:19:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kuard-test service=kuard
time="2020-06-15T14:24:57Z" level=info msg="service update" namespace=default service=kuard-service2
time="2020-06-15T14:24:57Z" level=info msg="service unchanged" namespace=default service=kuard-service2
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kuard-test service=kuard
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=default service=kubernetes
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=projectcontour service=contour
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=kube-dns
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager-webhook
time="2020-06-15T14:24:57Z" level=info msg="service update" namespace=projectcontour service=envoy
time="2020-06-15T14:24:57Z" level=info msg="service unchanged" namespace=projectcontour service=envoy
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=metrics-server
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-controller-metrics
time="2020-06-15T14:24:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-node-metrics
time="2020-06-15T14:24:57Z" level=info msg="service update" namespace=default service=kuard-service
time="2020-06-15T14:24:57Z" level=info msg="service unchanged" namespace=default service=kuard-service
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager-webhook
time="2020-06-15T14:29:57Z" level=info msg="service update" namespace=projectcontour service=envoy
time="2020-06-15T14:29:57Z" level=info msg="service unchanged" namespace=projectcontour service=envoy
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=cert-manager service=cert-manager
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=metrics-server
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-controller-metrics
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=hcloud-csi-node-metrics
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=projectcontour service=contour
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kube-system service=kube-dns
time="2020-06-15T14:29:57Z" level=info msg="service update" namespace=default service=kuard-service
time="2020-06-15T14:29:57Z" level=info msg="service unchanged" namespace=default service=kuard-service
time="2020-06-15T14:29:57Z" level=info msg="service update" namespace=default service=kuard-service2
time="2020-06-15T14:29:57Z" level=info msg="service unchanged" namespace=default service=kuard-service2
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=kuard-test service=kuard
time="2020-06-15T14:29:57Z" level=info msg="skipping non-LoadBalancer service" namespace=default service=kubernetes
Alright I think I got it, there's a deadlock which basically completely stops the conciliation when triggered. After the fix it started working correctly for me.
I have the image with a fix on my fork's dockerhub registry:
bslawianowski/hcloud-ip-floater:latest
@ghorofamike Could you try this image and verify if it fixes the issue for you?
Alright I think I got it, there's a deadlock which basically completely stops the conciliation when triggered. After the fix it started working correctly for me.
I have the image with a fix on my fork's dockerhub registry:
bslawianowski/hcloud-ip-floater:latest
@ghorofamike Could you try this image and verify if it fixes the issue for you?
It seems to me that the problem is solved, is this the change in the other PR?
Forgot to create PR yesterday, it's #13
care to provide a concrete example of how the service and floating ip should be labelled?