Open RobinJespersen opened 2 years ago
sudo iptables -t nat -nL |grep "10\.152\.183\."
returns on all nodes
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.152.183.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-B62C23KNXVA7TMZN tcp -- 0.0.0.0/0 10.152.183.247 /* default/test-service:http cluster IP */ tcp dpt:80
KUBE-MARK-MASQ tcp -- !10.1.0.0/16 10.152.183.247 /* default/test-service:http cluster IP */ tcp dpt:80
KUBE-MARK-MASQ tcp -- !10.1.0.0/16 10.152.183.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
microk8s status
returns
microk8s is running
high-availability: yes
datastore master nodes: XXX.XXX.XXX.XXX:19001 XXX.XXX.XXX.XXX:19001 XXX.XXX.XXX.XXX:19001
datastore standby nodes: none
addons:
enabled:
ha-cluster # Configure high availability on the current node
disabled:
ambassador # Ambassador API Gateway and Ingress
cilium # SDN, fast with full network policy
dashboard # The Kubernetes dashboard
dashboard-ingress # Ingress definition for Kubernetes dashboard
dns # CoreDNS
fluentd # Elasticsearch-Fluentd-Kibana logging and monitoring
gpu # Automatic enablement of Nvidia CUDA
helm # Helm 2 - the package manager for Kubernetes
helm3 # Helm 3 - Kubernetes package manager
host-access # Allow Pods connecting to Host services smoothly
inaccel # Simplifying FPGA management in Kubernetes
ingress # Ingress controller for external access
istio # Core Istio service mesh services
jaeger # Kubernetes Jaeger operator with its simple config
kata # Kata Containers is a secure runtime with lightweight VMS
keda # Kubernetes-based Event Driven Autoscaling
knative # The Knative framework on Kubernetes.
kubeflow # Kubeflow for easy ML deployments
linkerd # Linkerd is a service mesh for Kubernetes and other frameworks
metallb # Loadbalancer for your Kubernetes cluster
metrics-server # K8s Metrics Server for API access to service metrics
multus # Multus CNI enables attaching multiple network interfaces to pods
openebs # OpenEBS is the open-source storage solution for Kubernetes
openfaas # OpenFaaS serverless framework
portainer # Portainer UI for your Kubernetes cluster
prometheus # Prometheus operator for monitoring and logging
rbac # Role-Based Access Control for authorisation
registry # Private image registry exposed on localhost:32000
storage # Storage class; allocates storage from host directory
traefik # traefik Ingress controller for external access
There was a recent fix that is related to netfilter and calico.
It recommended to use a more specific channel for example --channel=1.24/stable
Thx for the hint.
I removed everything with
sudo snap remove microk8s --purge
and installed it with
sudo snap install microk8s --classic --channel=1.24/stable
and tried everything again, but still the same problem.
Last week I also tried 1.18/stable
and had the same problem.
You added the node's hostnames to /etc/hosts
to each node? I remember that i have to do that.
should not be necessary as the hostnames are public reachable DNS names
I already tried adding the nodes like this
microk8s join host.name:25000/92b2db237428470dc4fcfc4ebbd9dc81/2c0cb3284b05
but no success either.
Do you by any chance have 2 network interfaces?
ifconfig
returns on the first onde
cali8f14469af57: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 51413 bytes 4344562 (4.3 MB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 46028 bytes 34095652 (34.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
caliedc83d82522: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 11973 bytes 990374 (990.3 KB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 11068 bytes 5574873 (5.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
# public address
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 7013747 bytes 2005794268 (2.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 7013747 bytes 2005794268 (2.0 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan.calico: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.50.192 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::640b:86ff:fea7:d83b prefixlen 64 scopeid 0x20<link>
ether 66:0b:86:a7:d8:3b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 34 bytes 2040 (2.0 KB)
TX errors 0 dropped 7 overruns 0 carrier 0 collisions 0
on the second
cali5d301cb26b2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 42 bytes 3562 (3.5 KB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 16 bytes 1440 (1.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cali80701c0b6cd: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 53 bytes 4860 (4.8 KB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 25 bytes 2132 (2.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
# public address
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 2050408 bytes 545622554 (545.6 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2050408 bytes 545622554 (545.6 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan.calico: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.230.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::6479:6ff:feb0:51a1 prefixlen 64 scopeid 0x20<link>
ether 66:79:06:b0:51:a1 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 32 bytes 1920 (1.9 KB)
TX errors 0 dropped 7 overruns 0 carrier 0 collisions 0
and on the third
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
# public address
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 1442966 bytes 345062323 (345.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1442966 bytes 345062323 (345.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan.calico: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.179.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::640a:7ff:fe61:8a54 prefixlen 64 scopeid 0x20<link>
ether 66:0a:07:61:8a:54 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 64 bytes 3840 (3.8 KB)
TX errors 0 dropped 7 overruns 0 carrier 0 collisions 0
I purged the installation on node three and reinstalled microk8s and joined the cluster again.
Now ifconfig
shows
calie8f9a2c7112: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 210 bytes 23882 (23.8 KB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 213 bytes 104455 (104.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
# public address
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 1479365 bytes 371918667 (371.9 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1479365 bytes 371918667 (371.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan.calico: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.179.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::640a:7ff:fe61:8a54 prefixlen 64 scopeid 0x20<link>
ether 66:0a:07:61:8a:54 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 64 bytes 3840 (3.8 KB)
TX errors 0 dropped 7 overruns 0 carrier 0 collisions 0
But the problem still exists.
Just run ifconfig
again and now the calie8f9a2c7112
interface is gone.
Excluding network interface lo
, cali*
and vxlan
, you only have ens
?
Does your hostname comes with capital letters? In short does it follow the rules here? https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names
All calico pods are stable? I don't know if this can help https://github.com/canonical/microk8s/issues/1554#issuecomment-691426908
Excluding network interface lo, cali* and vxlan, you only have ens?
yes
In short does it follow the rules here? https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names
yes
All calico pods are stable? I don't know if this can help https://github.com/canonical/microk8s/issues/1554#issuecomment-691426908
As far as I can tell, yes. At least they don't have any restarts.
In /var/snap/microk8s/current/args/cni-network/cni.yaml
I have the entry
- name: IP_AUTODETECTION_METHOD
value: "can-reach=XXX.XXX.XXX.XXX"
with XXX.XXX.XXX.XXX being the IP of node two on the first node and the IP of node one for the other two.
I had a setup where all of my nodes (1 controller and 2 workers) were on the same private network. However kubectl get nodes -o wide
shows the public IP addresses in the Internal IP
column when I do the join operation. So I had to monkey patch it and it solved my issue.
@usersina thanks for the hint but does not help :-(
My nodes are only on a public network, so I entered in both files the public IPs.
Before and after kubectl get nodes -o wide
shows the public IPs in the Internal-IP
column and <none>
in the External-IP
column.
Meanwhile I also replaced one node by a Debian 11. But still exactly the same behavior.
Do you by any chance have 2 network interfaces?
What to do when you have two network interfaces? This still does not work for me so I still have to patch the cluster after joining. The patching however almost always fails if DNS is enabled due to timeouts. Also note that patching before joining is not possible.
Just reproduced this issue with Ubuntu 20.04 on arm64 on a clean install. Seems to effect just ClusterIP services - was able to get LoadBalancers working. Retrying again tomorrow.
Continuing to investigate: did a packet capture on eth0 (my primary interface) to make sure that packets were getting sent. This was the result:
11:44:55.145179 IP 10.1.10.193.50184 > 10.1.187.66.domain: 36185+ A? ports.ubuntu.com.default.svc.cluster.local. (60)
11:44:55.145287 IP 10.1.10.193.50184 > 10.1.187.66.domain: 56907+ AAAA? ports.ubuntu.com.default.svc.cluster.local. (60)
The packets were never seen at the destination node.
The route to get to the other node never gets added. Manually adding the route through ip route
enables temporary communication. @balchua, any chance you could look into this further?
This is what the routing table looks like by default:
ubuntu@k81:~$ ip route
default via 10.0.0.1 dev eth0 proto static
10.0.0.0/27 dev eth0 proto kernel scope link src 10.0.0.6
blackhole 10.1.10.192/26 proto 80
10.1.10.193 dev califb3eb82ef50 scope link
Looks like I have the same issue but my routing table looks filled:
home-kube01:~$ ip route
default via 192.168.200.1 dev eth0 proto static
blackhole 10.1.158.128/26 proto 80
10.1.158.159 dev cali463d9a511a6 scope link
10.1.158.161 dev cali2019a39bf40 scope link
10.1.158.162 dev cali675ab5b64e3 scope link
10.1.158.163 dev calic88c1e0b9f9 scope link
10.1.158.164 dev cali8a7384016d7 scope link
10.1.158.183 dev cali42a5ceceaa4 scope link
10.1.158.184 dev calia223862cd7d scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.200.0/24 dev eth0 proto kernel scope link src 192.168.200.231
Excluding network interface lo, cali* and vxlan, you only have ens?
yes
In short does it follow the rules here? https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names
yes
All calico pods are stable? I don't know if this can help #1554 (comment)
As far as I can tell, yes. At least they don't have any restarts.
In
/var/snap/microk8s/current/args/cni-network/cni.yaml
I have the entry- name: IP_AUTODETECTION_METHOD value: "can-reach=XXX.XXX.XXX.XXX"
with XXX.XXX.XXX.XXX being the IP of node two on the first node and the IP of node one for the other two.
Bro wtf ?
We are also seeing this same problem
My service / pod is only reachable from the node it is executed on.
my setup
I have three fresh and identical Ubuntu 20.04.4 LTS servers, each with its own public IP address.
I installed microk8s on all nodes by running:
sudo snap install microk8s --classic
On the master node I executed
microk8s add-node
and joined the two other servers by executingmicrok8s join XXX.XXX.X.XXX:25000/92b2db237428470dc4fcfc4ebbd9dc81/2c0cb3284b05
After that, by running
kubectl get no
I can see the three nodes all having the status ready. Andkubectl get all --all-namespaces
showswget --no-check-certificate https://10.152.183.1/
executed on all nodes returns alwaysSo far everything works as expected.
problem 1
I get the IP of calico-kube-controllers by calling
kubectl describe -n=kube-system pod/calico-kube-controllers-dc44f6cdf-flj54
And executing
wget https://10.1.50.194/
on the "master" node returnsand on the two other nodes
For my understanding, the IP of the pod should be reachable from all nodes. Is that correct?
problem 2
I installed the following deployment by calling
kubectl get all --all-namespaces
Calling
wget http://10.152.183.247/
on all nodes returns twiceand once
For my understanding, the service of should be reachable from all nodes. Calling wget on the ip of the pod itself shows exactly the same behavior.
workaround
Adding
hostNetwork: true
to the deployment makes the service accessible from all nodes, but that seems to be the wrong way of doing it.Does anyone have an Idea how I can debug this? I am out of Ideas.