Closed stephen304 closed 1 year ago
Chiming in from the same experience just setting up 5 nodes (using minimal ubuntu focal as base with only snap installed for microk8s) that only have one interface each, with ip6 connectivity only:
First node, fetching add-node command bleeds ip command output:
ubuntu@k8s-1:~$ microk8s add-node
Usage: ip address {add|change|replace} IFADDR dev IFNAME [ LIFETIME ]
[ CONFFLAG-LIST ]
ip address del IFADDR dev IFNAME [mngtmpaddr]
ip address {show|save|flush} [ dev IFNAME ] [ scope SCOPE-ID ]
[ to PREFIX ] [ FLAG-LIST ] [ label LABEL ] [up]
ip address {showdump|restore}
IFADDR := PREFIX | ADDR peer PREFIX
[ broadcast ADDR ] [ anycast ADDR ]
[ label IFNAME ] [ scope SCOPE-ID ]
SCOPE-ID := [ host | link | global | NUMBER ]
FLAG-LIST := [ FLAG-LIST ] FLAG
FLAG := [ permanent | dynamic | secondary | primary |
[-]tentative | [-]deprecated | [-]dadfailed | temporary |
CONFFLAG-LIST ]
CONFFLAG-LIST := [ CONFFLAG-LIST ] CONFFLAG
CONFFLAG := [ home | nodad | mngtmpaddr | noprefixroute | autojoin ]
LIFETIME := [ valid_lft LFT ] [ preferred_lft LFT ]
LFT := forever | SECONDS
From the node you wish to join to this cluster, run the following:
microk8s join none:25000/token
If the node you are adding is not reachable through the default interface you can use one of the following:
ubuntu@k8s-1:~$
And as specified in the post above, some parts seem hardcoded to listen on 0.0.0.0:
ubuntu@k8s-1:~$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:37587 0.0.0.0:* LISTEN 759/containerd
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 260/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 507/sshd: /usr/sbin
tcp 0 0 127.0.0.1:19001 0.0.0.0:* LISTEN 851/kube-apiserver
tcp 0 0 127.0.0.1:1338 0.0.0.0:* LISTEN 759/containerd
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 799/kubelet
tcp 0 0 0.0.0.0:25000 0.0.0.0:* LISTEN 924/python3
tcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 744/kube-scheduler
tcp 0 0 127.0.0.1:10252 0.0.0.0:* LISTEN 746/kube-controller
tcp6 0 0 :::10259 :::* LISTEN 744/kube-scheduler
tcp6 0 0 :::22 :::* LISTEN 507/sshd: /usr/sbin
tcp6 0 0 :::16443 :::* LISTEN 851/kube-apiserver
tcp6 0 0 :::10250 :::* LISTEN 799/kubelet
tcp6 0 0 :::10255 :::* LISTEN 799/kubelet
tcp6 0 0 :::10257 :::* LISTEN 746/kube-controller
udp 0 0 127.0.0.53:53 0.0.0.0:* 260/systemd-resolve
One culprit:
root 924 0.1 1.1 47936 24324 ? S 12:04 0:01 python3 /snap/microk8s/1791/usr/bin/gunicorn3 cluster.agent:app --bind 0.0.0.0:25000 --keyfile /var/snap/m[...]
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: I1119 12:04:23.632453 1477 node.go:136] Successfully retrieved node IP: snip
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: I1119 12:04:23.632485 1477 server_others.go:108] kube-proxy node IP is an IPv6 address (snip), assume IPv6 operation
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: W1119 12:04:23.638227 1477 proxier.go:649] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: W1119 12:04:23.639918 1477 proxier.go:649] Failed to load kernel module ip_vs_rr with modprobe. You can ignore thismessage when kube-proxy is running inside container without mounting /lib/modules
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: W1119 12:04:23.641496 1477 proxier.go:649] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: W1119 12:04:23.643146 1477 proxier.go:649] Failed to load kernel module ip_vs_sh with modprobe. You can ignore thismessage when kube-proxy is running inside container without mounting /lib/modules
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: W1119 12:04:23.644950 1477 server_others.go:579] Unknown proxy mode "", assuming iptables proxy
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: I1119 12:04:23.645010 1477 server_others.go:186] Using iptables Proxier.
Nov 19 12:04:23 k8s-1 microk8s.daemon-proxy[1477]: F1119 12:04:23.645026 1477 server.go:495] unable to create proxier: CIDR 10.1.0.0/16 has incorrect IP version: expect isIPv6=true
I found a few additional places that might be relevant / need fixes for ipv6
https://github.com/ubuntu/microk8s/blob/6a398b5eb1ae22e7b1d446925e694b402f7db561/scripts/cluster/agent.py#L41 https://github.com/ubuntu/microk8s/blob/2bdc1b843e6d8219baf3a252e0f3200f045ca9c0/microk8s-resources/default-args/cluster-agent#L1 https://github.com/ubuntu/microk8s/blob/e5f7ffe3cbfdf2283599e0d207050fb5549e4d47/scripts/cluster/distributed_op.py#L55
For me I think the key is mostly in the agent to get Flask to listen on ipv6.
Unfortunately I don't know anything so far that would help the add-node command. Mine works fine since I have ipv4.
Which pulls in the bind flag from microk8s-resources/default-args/cluster-agent
and passes to gunicorn, I'm not sure if gunicorn supports or needs to support ipv6 here or if it will work if just left off: https://github.com/benoitc/gunicorn/issues/1628
If an ipv6 address ends up needing to be set through --bind in microk8s-resources/default-args/cluster-agent
, then this is another place to be careful (I think as written it will work but could be fragile):
https://github.com/ubuntu/microk8s/blob/7c5607a6b6ebcfa51451af1cdff4eb6ec2c50ab5/scripts/cluster/common/utils.py#L149
Just to add some additional findings - it seems in order to fix port 25000, you can set --bind [::]:25000
and fix utils.py:
diff --git a/scripts/cluster/common/utils.py b/scripts/cluster/common/utils.py
index 2ceca00..3279c8b 100644
--- a/scripts/cluster/common/utils.py
+++ b/scripts/cluster/common/utils.py
@@ -148,7 +148,7 @@ def get_cluster_agent_port():
port_parse = port_parse[-1].split('=')
port_parse = port_parse[-1].split(':')
if len(port_parse) > 1:
- cluster_agent_port = port_parse[1].rstrip()
+ cluster_agent_port = port_parse[-1].rstrip()
return cluster_agent_port
I tried to fix the other issues but can't get very far because none of the snap builds on my pi install and I'm not sure how to cross compile from x86.
There's one area that i found need some work. The certificate generation.
Currently only ipv4 ips are added to the certificates valid IP addresses. https://github.com/ubuntu/microk8s/blob/303108ad8b8f025ab868fca4f5d30fb955c17ec5/microk8s-resources/actions/common/utils.sh#L338
@balasu I think that part is actually fine, the first thing get_ips
does is use hostname -I
to get a list of ip addresses which includes ipv6, then it adds the cni0 ip address. My render_csr_conf
properly results in my desired ipv6 address in the csr.conf.rendered
file:
...
[ alt_names ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster
DNS.5 = kubernetes.default.svc.cluster.local
IP.1 = 127.0.0.1
IP.2 = 10.x.x.x
IP.3 = 192.168.1.135
IP.4 = 10.x.x.x
IP.5 = 10.x.x.x
IP.6 = 201:x:x:x:x:x:x:x
IP.7 = fd42:1b3d:5904:3a13::1
...
Currently I'm facing a 500 error when trying to join.
@oivindoh Does add-node for you add a token to credentials/cluster-tokens.txt
? I think your issue is around here somewhere: https://github.com/ubuntu/microk8s/blob/303108ad8b8f025ab868fca4f5d30fb955c17ec5/microk8s-resources/wrappers/microk8s-add-node.wrapper
I'm trying to debug but have to wait for the snap to compile. I'm not sure how to run from source...
Thank you for doing this investigation @stephen304 @oivindoh
Here is a hack to quickly test out a change in the scripts without rebuilding the snap.
First download the .snap
package:
snap download microk8s
rm ./*.assert
Unpack the snap file:
unsquashfs ./microk8s_*.snap
Update the file in ./squashfs-root/scripts/cluster/agent.py
Recreate the snap package. Make sure you first delete any preexisting .snap file:
rm -rf ./s.snap
mksquashfs ./squashfs-root/ s.snap
Install the new snap:
sudo snap install ./s.snap --classic --dangerous
@ktsakalozos Thanks that helps a lot! I found another key change that prevents joining:
--- a/scripts/cluster/agent.py
+++ b/scripts/cluster/agent.py
@@ -582,7 +584,7 @@ def join_node_dqlite():
voters = get_dqlite_voters() # type: List[str]
# Check if we need to set dqlite with external IP
if len(voters) == 1 and voters[0].startswith("127.0.0.1"):
- update_dqlite_ip(request.host.split(":")[0])
+ update_dqlite_ip(":".join(request.host.split(":")[:-1]))
voters = get_dqlite_voters()
callback_token = get_callback_token()
remove_token_from_file(token, cluster_tokens_file)
Adding this allowed the join command to complete without error, but something after that doesn't seem to work properly because get nodes still shows 1 node.
Success! At least it looks like I've managed to get the nodes to cluster over ipv6. I had to edit:
--- a/scripts/cluster/join.py
+++ b/scripts/cluster/join.pyq
@@ -820,7 +836,7 @@ def update_dqlite(cluster_cert, cluster_key, voters, host):
if 'Address' in data:
port = data['Address'].split(':')[1]
- init_data = {'Cluster': voters, 'Address': "{}:{}".format(host, port)}
+ init_data = {'Cluster': voters, 'Address': "[{}]:{}".format(host, port)}
with open("{}/init.yaml".format(cluster_dir), 'w') as f:
yaml.dump(init_data, f)
~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
stephen-1 Ready <none> 44m v1.19.3-34+b9e8e732a07cb6
stephen-2 Ready <none> 40m v1.19.3-34+a56971609ff35a
Checking the health of the cluster shows that only calico-node on the new node doesn't start. The logs show this:
2020-11-23 22:50:07.407 [WARNING][8] startup.go 675: Unable to auto-detect IPv4 address by connecting to 200:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:ee79: dial udp4: address 200:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:ee79: no suitable address found
2020-11-23 22:50:07.408 [WARNING][8] startup.go 438: Couldn't autodetect an IPv4 address. If auto-detecting, choose a different autodetection method. Otherwise provide an explicit address.
2020-11-23 22:50:07.408 [INFO][8] startup.go 244: Clearing out-of-date IPv4 address from this node IP=""
2020-11-23 22:50:07.477 [WARNING][8] startup.go 1187: Terminating
Calico node failed to start
Editing the pod yml shows:
- name: FELIX_IPV6SUPPORT
value: "false"
It doesn't seem to let me save any changes to that, but hopefully after that it works completely.
Edit: Also I believe this documentation may be useful to solve this: https://docs.projectcalico.org/networking/ipv6#enabling-ipv6-support-in-calico
I'm not sure where to go from here. I'm guessing calico has something to do with ingress not working on secondary nodes - I spun up nginx and enabled / added ingress. I can only load nginx when hitting the ip of the master node which has the pod, but not the second node. It's my understanding that the internal "service" for the nginx deployment should be reachable from both nodes so that I can load balance requests between all nodes and have requests be shuttled to the correct node depending on where the pods are.
@ktsakalozos Is there enough information here to hand off the issue? I'm a docker noob so I don't have any business poking around calico.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The issue is not stale.
microk8s still not handles IPv6 properly.
# microk8s join 2001:db8:ac7:8901:f05d:7b70:e348:fef8:25000/11111111112222222222333333333344/eeffffffffff
Contacting cluster at 2001
Connecting to cluster failed with nonnumeric port: 'db8'.
It seems the
microk8s join
script doesn't handle ipv6 addresses at all. My nodes use an ipv6 overlay network to communicate and cannot reach each other over ipv4. Maybe adding a -6 flag is a good solution, or potentially detecting the ipv6 format.I tried to fix it with this patch:
But I just realized that it seems whatever is on port 25000 is only on the lan ipv4 address of my first node. Going off this documentation and port scanning both the v4 and v6 address, it seems that out of the box, port 25000 is the only one that is v4 only for some reason. Maybe it will work once that's fixed.