Closed iohenkies closed 4 years ago
I've found this issue that seems to be the same: https://github.com/cbeneke/hcloud-fip-controller/issues/25
I also do not have an EXTERNAL-IP listed doing a kubectl get nodes -o wide
. At INTERNAL-IP my public IP addresses are listed :(. The cloud controller manager is deployed and active.
From this thread (#25), setting the node_address_type option to internal (or external) doesn't solve the problem.
What should I do?
I've started completely over, reconfiguring the cluster so that at INTERNAL-IP indeed are my internal IPs (extra args for the kubelet) and while setting up with kubeadm the --apiserver-advertise-address is also specifying the internal IP. Long story short: exact same error and problem.
Then I've started completely over while declaring the external IP at the kubelet extra args and --apiserver-advertise-address: same problem.
Of course I would like it all to work on the private network instead of the public IPs, but at this point I would like to see it working either way.
This takes an enormous amount of work. Although I'm several steps further, it still doesn't work.
A few important points (others can hopefully benefit from my painful process):
/etc/systemd/system/kubelet.service.d/20-hetzner-cloud.conf
. I had to add it to /var/lib/kubelet/kubeadm-flags.env
hcloud network list
from the command linehttps://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/v1.5.1-networks.yaml
kubectl get nodes -o wide
should result in the correct INTERNAL-IP and EXTERNAL-IP. This is taken care by the hcloud controller managername: config
as a name and not name: metallb-config
. It just doesn't work with the latterkubectl get svc --all-namespaces
should show your floating IP in the EXTERNAL-IP column. A state of pending
is no goodThis is the only thing in the debug logs:
[henkies@kube01] ~ $ k -n fip-controller logs fip-controller-74ldv
I0304 18:04:39.026931 1 leaderelection.go:235] attempting to acquire leader lease fip-controller/fip...
I0304 18:05:14.974486 1 leaderelection.go:245] successfully acquired lease fip-controller/fip
time="2020-03-04T18:05:14Z" level=info msg="Started leading" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).onStartedLeading" file="/app/internal/app/fipcontroller/leaderelection.go:53"
time="2020-03-04T18:05:14Z" level=debug msg="Checking floating IPs" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:77"
time="2020-03-04T18:05:14Z" level=debug msg="Found 3 nodes" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:32"
time="2020-03-04T18:05:14Z" level=debug msg="Found 3 addresses" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:41"
time="2020-03-04T18:05:14Z" level=debug msg="Using address type ExternalIP" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:47"
time="2020-03-04T18:05:14Z" level=debug msg="Found node address: 116.203.101.104" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:83"
And the other pod that got moved after bringing the nodes down:
Error from server: Get https://172.16.0.3:10250/containerLogs/fip-controller/fip-controller-xs9ts/fip-controller: dial tcp 172.16.0.3:10250: i/o timeout
Hey, sorry for the long reaction time and first of all: thanks for the detailed description!
A few notes beforehand: The article I have written when kubernetes 1.15.3 was the latest version. It is to be expected, that API references etc change. My private cluster ist still running von 1.16 (I just didn't find the time to update it yet), so there might also be an incompatibiliy with 1.17 the way it is programmed (I am also not certain if the hetzner cloud controller supports it yet, the changelog only mentiones v1.16). I will have to check this when I find the time (which unfortunately is a VERY limited resource for me atm :/ ) Flannel should work fine on a 1.17 cluster, please check the kubernetes installation guide and validate you are using the correct version.
Regarding the metallb config: Are you installing it via helm or manually? The guide focused on helm (which is using a custom name), the default would be config instead of metallb-config (compare https://github.com/helm/charts/blob/master/stable/metallb/templates/config.yaml).
Do the logs just stop after Found node address
? The address is used to map the hcloud object to the kubernets object. The direct next call is a hetznerclient call against the hetzner API. Is your token working? (Can you use that token to fetch your servers?)
The error message you posted in the end comes afaict from the kubernetes API server. It tries to connect to the kubelet (port 10250) and times out. Can your master pod reach all the nodes in the cluster correctly?
Hi Christian, thank you for the response.
As said I will reinstall once again with different Kubernetes versions.
The problems are exactly the same for me on Kubernetes 1.16.7. I don't understand since you have it running on 1.16 ifI understand it correctly.
And on a side note, Flannel still doesn't work on this fresh install wit the corrects ports open. It starts and all, but pods cannot communicate with each other. I do not have these problems with Weave or Calico.
Yes, my cluster is running on v1.16.7 atm and has no problems (using flannel, hcloud-cloud-controller, metallb and multiple fip-controller). Could you - just for a test - open your firewall completely? The flannel errors sound a lot like something in your network is set up in a way that it can not communicate correctly with the other pods, and a functional network in your cluster is definetely not optional :) Also are you using the correct subnet ranges (compare to the installation guide I linked above)?
Hi Christian. After disabling the firewall completely, flannel does work. Very strange because of all the open ports (all kubernetes defaults + the flannel udp ports), but nevertheless this is of a later concern for me to find out because I still can't get the fip-controller to work...
So I have the version at 1.16.7, flannel up, all pods up, same logs as before. I'm trying with deployment atm instead of dameonset but result the same. Two pods say:
I0309 17:43:53.046790 1 leaderelection.go:235] attempting to acquire leader lease fip-controller/fip...
And one pod says:
I0309 17:43:52.909930 1 leaderelection.go:235] attempting to acquire leader lease fip-controller/fip...
I0309 17:43:52.931169 1 leaderelection.go:245] successfully acquired lease fip-controller/fip
time="2020-03-09T17:43:52Z" level=info msg="Started leading" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).onStartedLeading" file="/app/internal/app/fipcontroller/leaderelection.go:53"
time="2020-03-09T17:43:52Z" level=debug msg="Checking floating IPs" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:77"
time="2020-03-09T17:43:52Z" level=debug msg="Found 3 nodes" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:32"
time="2020-03-09T17:43:52Z" level=debug msg="Found 3 addresses" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:41"
time="2020-03-09T17:43:52Z" level=debug msg="Using address type ExternalIP" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:47"
time="2020-03-09T17:43:52Z" level=debug msg="Found node address: 116.203.101.72" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:83"
time="2020-03-09T17:43:53Z" level=debug msg="Fetched %!s(int=3) servers" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).server" file="/app/internal/app/fipcontroller/hcloud.go:39"
time="2020-03-09T17:43:53Z" level=debug msg="Found matching public IP on server 'kube02.domain.io'" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).server" file="/app/internal/app/fipcontroller/hcloud.go:44"
time="2020-03-09T17:43:53Z" level=debug msg="Found server: kube02.domain.io (4786060)" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:89"
time="2020-03-09T17:43:53Z" level=info msg="Initialization complete. Starting reconciliation" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).Run" file="/app/internal/app/fipcontroller/controller.go:61"
Repeating itself every 30 seconds.
The floating IP is configured at the 2 worker nodes, but looking in the Hetzner console it is not assigned to any node. It can't be pinged. Only after manually assigning, it can be pinged, but does not failover when I bring down the node (for testing purposes).
Also, when I assign the floating IP manually to node03 instead of node02, the logs say the same (i.e. Found server: kube02.domain.io (4786060)
although I manually assigned it to node03).
I really hope you can make something of it.
I'm not 100% positive, but I believe that earlier the Nginx ingress did also not receive the EXTERNAL-IP and it does now. But failover still does not happen. A kubectl get nodes
shows the node as NotReady
, pods get evicted, but floating IP remains unreachable. Although the floating IP in this test case is attached to kube02
, fip controller only talks about kube03
.
time="2020-03-10T07:36:03Z" level=debug msg="Checking floating IPs" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:77"
time="2020-03-10T07:36:03Z" level=debug msg="Found 3 nodes" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:32"
time="2020-03-10T07:36:03Z" level=debug msg="Found 3 addresses" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:41"
time="2020-03-10T07:36:03Z" level=debug msg="Using address type ExternalIP" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).nodeAddress" file="/app/internal/app/fipcontroller/kubernetes.go:47"
time="2020-03-10T07:36:03Z" level=debug msg="Found node address: 116.203.101.104" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:83"
time="2020-03-10T07:36:03Z" level=debug msg="Fetched %!s(int=3) servers" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).server" file="/app/internal/app/fipcontroller/hcloud.go:39"
time="2020-03-10T07:36:03Z" level=debug msg="Found matching public IP on server 'kube03.domain.io'" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).server" file="/app/internal/app/fipcontroller/hcloud.go:44"
time="2020-03-10T07:36:03Z" level=debug msg="Found server: kube03.domain.io (4786062)" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:89"
time="2020-03-10T07:36:33Z" level=debug msg="Checking floating IPs" func="github.com/cbeneke/hcloud-fip-controller/internal/app/fipcontroller.(*Controller).UpdateFloatingIPs" file="/app/internal/app/fipcontroller/controller.go:77"
To the controller it does not matter which node the IP is currently attached to. It will - when being the leader - detach it from where it is and attach it to the node it is currently running on. I guess the pod which won the leader election is running on kube3
? :)
Which version of the controller are you running? Could you also please paste your (redacted) deployed config? It somehow seems the HcloudFloatingIPs field is not initialized correctly.
The ingress not showing up an external IP gives a hint, that your metalLB might've been misconfigured / not working properly. Since the ingress pull the external IP from the loadbalancer service object, which gets updated by metalLB.
Hi Christian,
Deployment:
apiVersion: v1
items:
- apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"fip-controller","namespace":"fip-controller"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"fip-controller"}},"strategy":{"rollingUpdate":{"maxSurge":1,"maxUnavailable":1},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app":"fip-controller"}},"spec":{"containers":[{"env":[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"fieldPath":"spec.nodeName"}}},{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}}],"envFrom":[{"secretRef":{"name":"fip-controller-secrets"}}],"image":"cbeneke/hcloud-fip-controller:v0.3.1","imagePullPolicy":"IfNotPresent","name":"fip-controller","volumeMounts":[{"mountPath":"/app/config","name":"config"}]}],"serviceAccountName":"fip-controller","volumes":[{"configMap":{"name":"fip-controller-config"},"name":"config"}]}}}}
creationTimestamp: "2020-03-09T17:43:51Z"
generation: 1
name: fip-controller
namespace: fip-controller
resourceVersion: "84009"
selfLink: /apis/apps/v1/namespaces/fip-controller/deployments/fip-controller
uid: d0532a37-3fcb-4cd3-8b7e-7f4e682e0e59
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: fip-controller
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: fip-controller
spec:
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
envFrom:
- secretRef:
name: fip-controller-secrets
image: cbeneke/hcloud-fip-controller:v0.3.1
imagePullPolicy: IfNotPresent
name: fip-controller
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /app/config
name: config
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: fip-controller
serviceAccountName: fip-controller
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: fip-controller-config
name: config
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2020-03-09T17:43:54Z"
lastUpdateTime: "2020-03-09T17:43:54Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-03-09T17:43:51Z"
lastUpdateTime: "2020-03-09T17:43:54Z"
message: ReplicaSet "fip-controller-5c95ff6b4f" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 3
replicas: 3
updatedReplicas: 3
kind: List
metadata:
resourceVersion: ""
selfLink: ""
ConfigMap fip-controller-config
:
apiVersion: v1
data:
config.json: |
{
"hcloudFloatingIPs": [ "MYIP" ],
"nodeAddressType": "external",
"log_level": "Debug"
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"config.json":"{\n \"hcloudFloatingIPs\": [ \"MYIP\" ],\n \"nodeAddressType\": \"external\",\n \"log_level\": \"Debug\"\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"fip-controller-config","namespace":"fip-controller"}}
creationTimestamp: "2020-03-09T17:43:46Z"
name: fip-controller-config
namespace: fip-controller
resourceVersion: "4093"
selfLink: /api/v1/namespaces/fip-controller/configmaps/fip-controller-config
uid: 4339d0c6-472b-46be-98d0-8d2ee6582033
Secret fip-controller-secrets
. Here I just discovered, but I have redacted both, that the first HCLOUD_API_TOKEN is different from the second HCLOUD_API_TOKEN. Is this normal?
apiVersion: v1
data:
HCLOUD_API_TOKEN: FIRST TOKEN
kind: Secret
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Secret","metadata":{"annotations":{},"name":"fip-controller-secrets","namespace":"fip-controller"},"stringData":{"HCLOUD_API_TOKEN":"SECOND TOKEN"}}
creationTimestamp: "2020-03-09T17:43:46Z"
name: fip-controller-secrets
namespace: fip-controller
resourceVersion: "4094"
selfLink: /api/v1/namespaces/fip-controller/secrets/fip-controller-secrets
uid: d21e636f-2cb4-481c-a9a6-7fdaa00443bd
type: Opaque
Secret fip-controller-token-jbdzm
:
apiVersion: v1
data:
ca.crt: REDACTED
token: REDACTED
kind: Secret
metadata:
annotations:
kubernetes.io/service-account.name: fip-controller
kubernetes.io/service-account.uid: c8d18922-94bb-416c-8094-d240a6e4ac1f
creationTimestamp: "2020-03-09T17:43:37Z"
name: fip-controller-token-jbdzm
namespace: fip-controller
resourceVersion: "4079"
selfLink: /api/v1/namespaces/fip-controller/secrets/fip-controller-token-jbdzm
uid: beee3e1f-90d4-4342-947e-30381bcd1655
type: kubernetes.io/service-account-token
Let me know if you need anything else. Many thanks for your help.
Ah okay, the config format is still from v0.1 (compare changelog v0.2.0), please adapt the config accordingly:
apiVersion: v1
kind: ConfigMap
metadata:
name: fip-controller-config
namespace: fip-controller
data:
config.json: |
{
"hcloud_floating_ips": [ "MYIP" ],
"node_address_type": "external",
"log_level": "Debug"
}
Also the node_address_type external is default. You don't need to add this :) Can you try to run the controller with the correct version of the config?
OMG works instantly. Thank you for your help. It's wrong in the guide, OK in the GitHub readme. I'll do some more testing with Kubernetes 1.17 and different CNIs. I can report back I you would like.
Thats good to hear! Yeah, the guide was written on the v0.1.0 version of the controller, thats why I added the hint about being still under development. Will try to get an update into the guide over the weekend.
If you spend the time on it anyway I would like to hear the results, but don't spend extra time if you wouldn't do so anyway :)
Ok well I'm done testing now and can continue with the real purpose of the cluster.
FWIW I can confirm that this Hetzner configuration is compatible with:
I've got one more question before signing off: what is the added benefit of the Hetzner Cloud Container Storage Interface? How can I use this?
Thanks for taking the time and writing down your results!
Regarding your question: The Hetzner Cloud CSI (driver) is an implementation for the kubernetes CSI which enables you to use hetzner volumes as native volumes in kubernetes (the controller takes care of commission, attaching to the correct node on pod start, decomission etc). Have a look at the csi driver docs for info how to install it!
I'm closing the issue now :)
Hi all,
just wanted to put some details about the new Hetzner Ubuntu 20.04 image. It uses netplan instead of ifupdown as the tool to configure networking.
The official docs from Hetzner states that a /etc/netplan/60-floating-ip.yaml
should be created with the floating IP.
That is also the case, but the list of addresses must also include the server IP address assigned by dhcp as the first entry. for docker (and hence also kubernetes) to work. An example is:
cat << EOF > /etc/netplan/60-floating-ip.yaml
network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses:
- <server_public_ip_as_assigned_by_dhcp>/32
- <floating_ip>/32
EOF
Run netplan apply
to apply the configuration.
This will make sure that the server IP address remains the primary, i.e. ifconfig eth0
shows the server IP address and not the floating IP.
The floating IP becomes the primary IP if the dhcp assigned server IP address is not included in the list of addresses. This in turn sets the floating IP as source for outgoing traffic from within docker, resulting in no internet access within containers. This can be seen by running tcpdump -ni eth0 icmp
together with docker run --rm alpine ping -- 1.1.1.1
.
Hi,
I'm following along with this: https://community.hetzner.com/tutorials/install-kubernetes-cluster
Basically I have a fully functional cluster (all nodes an deployments healthy, all pods up and running) but can't get the floating IP and fipcontroller to work. After 2 full days I think its time to file an issue :)
When I try to install the fipcontroller via the above link or the below slightly different instructions (daemonset or deployment doesn't matter, same issue): https://github.com/cbeneke/hcloud-fip-controller/blob/master/README.md
The fipcontroller pods keep restarting with this in the logs:
The culprit seamlingly being:
But why I do not understand. DNS is OK on the host and in the cluster, coreDNS is running, I'm out of options.
On a side note, once again following the guide at: https://community.hetzner.com/tutorials/install-kubernetes-cluster
Should the floating IP not be active at this point? vipcontroller is only for moving the IP, no? I'm wondering since I've created a test service of type LoadBalancer and does not get the Hetzner floating IP.