Closed ednaganon closed 2 years ago
Hi @ednaganon
You can get an HA with dqlite cluster by deploying MicroK8s with:
sudo snap install microk8s --classic --channel=edge/test-dqlite
We are now working on ways to transition the already existing deployments to a HA-dqlite setup.
EDIT: The edge/test-dqlite
channel is not available any more. Use the latest/edge/ha-preview
instead.
Besides transitioning existing deployments, how can we create new deployments of microk8s with HA? I understand this would be the method to install, but how can it be configured? Specifically, I'm thinking that I would be able to have multiple master nodes?
@RootHouston if you join three or more nodes from the --channel=edge/test-dqlite
they will form an HA cluster.
@ktsakalozos i tried to give it a shot but the branch does not seem to be available?
$ sudo snap install microk8s --classic --channel=edge/test-dqlite
error: requested a non-existing branch on latest/edge for snap "microk8s": test-dqlite
@philkry, we are working on the next iteration of the ha work that would allow the transition of a non-ha cluster to ha. I will ping you again in this thread hopefully within the week to show you how this work progresses. Thank you for your interest, we appreciate your feedback.
@ktsakalozos Is it already possible to create fresh N node cluster with HA enabled? Which channel should be used?
To get an early glimpse of HA deploy microk8s with:
sudo snap install microk8s --classic --channel=latest/edge/ha-preview
Enabling ha-cluster
is required on all nodes joining/forming an HA cluster.
microk8s enable ha-cluster
The joining process has not changed. You need to microk8s add-node
to produce a connection string and use that string to join a node with microk8s join
.
Your cluster needs to have at least three nodes. To check the status of HA you can:
microk8s status ha-cluster
To remove a node, first do a microk8s leave
on the departing node and then a microk8s remove-node
on any of the nodes of the cluster.
@philkry @nojakub @balchua @iskitsas @knkski @tvansteenburgh really interested in your feedback on this preview.
@ktsakalozos thanks for the info! i quickly spun up 3 nodes, cluster assembly worked as expected. control plane stays available if i kill one node and that node successfully re-joins the cluster when it becomes available again :) i noticed one thing where i'm not sure it is expected behaviour or not: it takes quite some time (not measured but i would guesstimate 1minute +) until the two remaining nodes report the unavailable node as 'NotReady'. when i just issue a reboot, the node reboots fairly quickly and never gets reported NotReady at all even tough i see dqlite reporting connection issues
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:04+02:00" level=warning msg="dqlite: server unavailable err=failed to establish network connection: connect to HTTP endpoint: connect to server: dial tcp 10.0.0.3:19001: i/o timeout address=10.0.0.3:19001 attempt=0"
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:04+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:34030->10.0.0.2:19001: i/o timeout address=10.0.0.2:19001 attempt=0"
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:05+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:52412->10.0.0.4:19001: i/o timeout address=10.0.0.4:19001 attempt=0"
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.323068776+02:00" level=info msg="ExecSync for "14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d" with command [/bin/calico-node -felix-live] and timeout 1 (s)"
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.549408484+02:00" level=info msg="Finish piping "stdout" of container exec "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec""
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.549473863+02:00" level=info msg="Finish piping "stderr" of container exec "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec""
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.553301009+02:00" level=info msg="Exec process "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec" exits with exit code 0 and error <nil>"
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.613600770+02:00" level=info msg="TaskExit event &TaskExit{ContainerID:14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d,ID:6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec,Pid:22016,ExitStatus:0,ExitedAt:2020
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.615017697+02:00" level=info msg="ExecSync for "14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d" returns with exit code 0"
May 17 18:54:08 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:08 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to establish network connection: connect to HTTP endpoint: connect to server: dial tcp 10.0.0.3:19001: i/o timeout address=10.0.0.3:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:34050->10.0.0.2:19001: i/o timeout address=10.0.0.2:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:52432->10.0.0.4:19001: i/o timeout address=10.0.0.4:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Failed to get servers: no available dqlite leader server found
Every 1.0s: microk8s kubectl get nodes node1: Sun May 17 18:56:29 2020
NAME STATUS ROLES AGE VERSION
node1 Ready <none> 94m v1.18.2-2+3a7b7d1884aa18
node3 Ready <none> 88m v1.18.2-2+3a7b7d1884aa18
node2 Ready <none> 90m v1.18.2-2+3a7b7d1884aa18
only if i shut the node completely down and wait, it eventually shows up as NotReady. i will test it more thoroughly with some workloads later..
@philkry thank you for your feedback. Can you provide more information on how you kill the node? In my tests the departing node is marked as NotReady almost immediately but the pods do not get re-scheduled until much later. This may have to do with the default five minute eviction delay [1].
Also, @freeekanayaka correctly if I am wrong, if you kill (not cleanly remove) the node acting as the leader in dqlite the remaining two nodes may not be able to elect a leader.
[1] https://kubernetes.io/docs/concepts/architecture/nodes/#node-status
Also, @freeekanayaka correctly if I am wrong, if you kill (not cleanly remove) the node acting as the leader in dqlite the remaining two nodes may not be able to elect a leader.
I can't speak for the NotReady state as that's something that k8s controls, for which I don't know the exact logic.
Regarding the dqlite database, if you have at least 3 nodes (i.e. you are HA), and you gracefully shutdown the current leader (with systemctl stop snap.microk8s.daemon-apiserver
or any equivalent way to send a TERM
signal to the apiserver process) then the leadership is immediately transferred to one of the two other nodes, and database downtime is minimal (it just requires the client code to reconnect). On the other hand if the current leader dies abruptly (with KILL
signal or a machine power loss) the 2 remaining nodes will take up 5 or 10 seconds to realize that the leader has die and run a new election to decide who the new leader will be. During that time client code will receive errors, but it should work again after that.
@ktsakalozos what i did was just rebooting the the node via commandline. i did not cleanly remove it from the cluster. do i always have to drain the node properly to keep dqlite working? how will HA work, if one node (out of 3) dies due to h/w failure? would this scenario not be covered at all?
edit: @freeekanayaka replied at the same time i did, so i guess my question is answered.
@ktsakalozos: I've got a cluster deployed with 10 MicroK8s nodes from the stable channel. Will I have to do anything special to try out HA beyond sudo snap switch microk8s --channel=edge/ha-preview && sudo snap refresh && microk8s enable ha-cluster
?
@knkski in general switching across channels is not recommended because the services configuration may have diverged. In this case the latest/stable
and the ha-preview
should be pretty close.
@ktsakalozos: I removed the nodes from the cluster after getting a message about having to remove and rejoin them after enabling HA. I then tried to join a node to the cluster and got this error message:
$ microk8s join 1.2.3.4:25000/...
Failed to join cluster. Error code 500.
Checking on the master node, I saw this traceback:
Waiting for access to cluster. [2020-05-19 13:38:39,074] ERROR in app: Exception on /cluster/api/v2.0/join [POST]
Traceback (most recent call last):
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/snap/microk8s/1415/scripts/cluster/agent.py", line 557, in join_node_dqlite
voters = get_dqlite_voters() # type: List[str]
File "/snap/microk8s/1415/scripts/cluster/agent.py", line 452, in get_dqlite_voters
with open("{}/info.yaml".format(cluster_dir)) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/microk8s/current/var/kubernetes/backend/info.yaml'
Not really sure how I triggered this, but purging the snap and reinstalling it worked.
It would also be nice to have slightly better messaging when joining a node. I see this:
Waiting for node to join the cluster. .. .. .. .. .. .. .. .. .. ..
Am I supposed to join a third node before it stops waiting? If I don't add a third node by the time it finishes waiting, has it failed to add the current node?
Thank you for giving it a try @knkski
To try and reproduce the error I will start from latest/stable refresh to edge/ha-cluster and re-join the nodes.
Joining one node at a time is preferable but since we are in the testing phase we could test the simultaneous join scenario.
Joining in HA takes more time that the non-ha case so the boring "Waiting for node to join the cluster" is displayed for a longer time. I will see if we have more details show.
@ktsakalozos thank for the how to, I can spin ha-cluster with 3 nodes they are connecting to eachother and reporing Ready state. But when I inspect microk8s I get this:
root@k8s3:[/]# microk8s.inspect
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
FAIL: Service snap.microk8s.daemon-flanneld is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-flanneld
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
FAIL: Service snap.microk8s.daemon-etcd is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-etcd
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Should flannel and etcd report FAIL in this case?
Yes, etcd and flannel are not used in the HA setup. Calico and dqlite are replacing those. I am making a note to fix inspect so it does not report this as failure. Thank you @nojakub
@ktsakalozos I'm seeing some problems with certificates. It appears that the certificates from the "node 0" of the cluster are pushed to the other nodes. Whenever I try to access the cluster from any other node, I get: Unable to connect to the server: x509: certificate is valid for 127.0.0.1, 10.152.183.1, 172.31.65.17, 10.1.56.0, not 172.31.70.251
Is this normal? Or should a new certificate be generated that encompasses all the endpoints and then be spread to the other nodes?
EDIT: The following is not needed anymore. Server certs are now created with the IPs MicroK8s can detect on the node. However, you may still find the following useful in case there is for example an VIP leading to the node(s) and there is no interface with that IP on the node.
@ed1000 currently there is no way to redistribute certificates across nodes and I am not sure if this is the right approach.
Here is a workaround assuming you know which nodes will join the cluster. After you have microk8s enable ha-cluster
on 'node 0' and before joining the nodes, edit the /var/snap/microk8s/current/certs/csr.conf.template
file. In the [ alt_names ]
section you can add the DNS entries and IPs of the nodes to be joined. For example I added the following in a test cluster:
DNS.6 = ec2-66-66-66-66.compute-1.amazonaws.com
DNS.7 = ec2-66-44-44-46.compute-1.amazonaws.com
DNS.8 = ec2-55-55-55-155.compute-1.amazonaws.com
IP.A = 22.22.222.22
IP.B = 33.33.3.33
IP.C = 44.4.244.144
Note that the #MOREIPS
will be replaced with any local IPs MicroK8s could detect.
Within 5 seconds MicroK8s will detect the change in the csr.conf.template
file and re-issue the certificates. At this point you can join the rest of the nodes to form the cluster.
Changing and distributing the certificates after the initial cluster formation is something we need to improve but I do not have a good solution right now.
EDIT: The admin user will have the same token on all nodes.
As there is no central identity management each node has its own admin user.
If you want an admin user with the same credentials across all nodes you will have to create one by add to /var/snap/microk8s/current/credentials/known_tokens.csv
something like:
enlcxdasdsadadsfdsafdasfdsaRQcz0K,haadmin,haadmin,"system:masters"
We will see if we can automate this, but for now this user creation is a manual step.
Hi,
I'm having some problems with calico (not exactly an issue with this repo).
Calico chooses the wrong interface, this seems to be a common issue and the solution seems to be to set the env variable IP_AUTODETECTION_METHOD
to something different from default first-found
.
I couldn't find a way to do it yet, the deployment calico-kube-controllers
does not have a template of a calico-node
and I'm a bit puzzled... Is there a way to work around it?
Thanks a lot!
---------------------------- EDIT----------------------------------------- It turns out that this problem is due to either some proxy or firewall in the net config of my institute... Please ignore my comment.
@panManfredini is this environment IP_AUTODETECTION_METHOD
modifiable after calico-node
starts running? Or it has to be done before installing calico-node
?
Sorry not very familiar with calico. 😁
Hi,
I'd like to test HA clustering on a 3-nodes RPi4 cluster, but it seems that the latest/edge/ha-preview branch is no longer opened (or maybe is not available for arm64). Is there another branch I could test?
Thanks a lot!
latest/edge/ha-preview branch is no longer opened (or maybe is not available for arm64).
@nikkoura, I just updated the launchpad builders to build snaps for arm64. Please give it a few hours for the branch to be populated. Thank you for giving this a try. Your feedback is much appreciated.
Thanks for the build. I have been able to use them to test them on my setup. So far, so good, except a small issue with coredns (#1292)
Thank you for reporting the coredns issue @nikkoura, the latest build should have this addressed.
@ktsakalozos is it possible to add a worker-only node to a 3 node HA cluster? i was expecting it to work by just adding a node without prior
microk8s enable ha-cluster
but i do get an error upon joining the node. is this not supported?
my idea was to have a 3 node HA control pane and adding workers as needed.
Today I spent sometime trying to enable ha on a 3 node microk8s cluster but failed with various problems.
The first problem is that on the HA channel, enabling DNS add-on fails on a fresh snap installation (before trying to enable HA). Doing kubectl explain coredns_pod
shows dependency failure on calico.
The second problem is the HA itself..On the master I could enable HA. Then I got the message like others to remove and re-add other nodes:
# microk8s join 10.0.0.3:25000/someguid...
Failed to join cluster. Error code 501. Failed to join the cluster. This is an HA dqlite cluster.
Please, retry after enabling HA on this joining node with 'microk8s enable ha-cluster'.
However I was unable to enalbe HA and join other nodes back into the cluster. They would get stuck in pending state with this failure ( so I had to press Ctrl-C and cancel it. Same on the other node):
# microk8s enable ha-cluster
No resources found in default namespace.
Enabling HA
Upgrading the network CNI
Waiting for the cluster to restart
Waiting for the CNI to deploy
daemon set "calico-node" successfully rolled out
Waiting for deployment "calico-kube-controllers" rollout to finish: 0 of 1 updated replicas are available...
^CFailed to enable ha-cluster
And here is the inspeciton of the calico pod which were stuck at creation:
# microk8s.kubectl describe pod/calico-kube-controllers-555fc8cc5c-v7zx2 -n kube-system
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=calico-kube-controllers
pod-template-hash=555fc8cc5c
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/calico-kube-controllers-555fc8cc5c
Containers:
calico-kube-controllers:
Image: calico/kube-controllers:v3.13.2
Port: <none>
Host Port: <none>
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-84kxf (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
calico-kube-controllers-token-84kxf:
Type: Secret (a volume populated by a Secret)
SecretName: calico-kube-controllers-token-84kxf
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler no nodes available to schedule pods
Warning FailedScheduling <unknown> default-scheduler no nodes available to schedule pods
Warning FailedScheduling <unknown> default-scheduler no nodes available to schedule pods
@ktsakalozos it would be great if you could document the correct process to create an HC cluster. Thanks.
is it possible to add a worker-only node to a 3 node HA cluster?
@philkry, this is not possible. We are looking into ways to take into account any user preferences.
Why do you want a worker only node? I can thing of a few reasons, I just want to hear what your exact requirement is.
@mehdisadeghi thank you for trying out HA and providing feedback.
I would like some more info in order to understand what exactly happened and why.
enabling DNS add-on fails on a fresh snap installation (before trying to enable HA)
Can you show me what happens? Could you attach the tarball created by microk8s.inspect
? Here is a test run I just did:
ubuntu@ip-172-31-24-191:~$ sudo snap remove microk8s microk8s removed
ubuntu@ip-172-31-24-191:~$ sudo snap install microk8s --classic --channel=latest/edge/ha-preview
microk8s (edge/ha-preview) v1.18.4 from Canonical✓ installed
ubuntu@ip-172-31-24-191:~$ sudo microk8s.enable dns
Enabling DNS
Applying manifest
serviceaccount/coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
clusterrole.rbac.authorization.k8s.io/coredns created
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
DNS is enabled
ubuntu@ip-172-31-24-191:~$ sudo microk8s.kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/coredns-588fd544bf-vfxzd 1/1 Running 0 27s 10.1.95.2 ip-172-31-24-191 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 2m33s <none>
kube-system service/kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 27s k8s-app=kube-dns
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/coredns 1/1 1 1 27s coredns coredns/coredns:1.6.6 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/coredns-588fd544bf 1 1 1 27s coredns coredns/coredns:1.6.6 k8s-app=kube-dns,pod-template-hash=588fd544bf
On the second issue you reported:
They would get stuck in pending state with this failure ( so I had to press Ctrl-C and cancel it. Same on the other node)
This indicates that the pod scheduling of calico is failing. There are a few reasons why that may be. Again, could you attach the tarball created by microk8s.inspect
?
On documenting the HA setup process the only description for now is the comment [1]. We are still exploring how to improve the UI and we are open to any suggestions.
[1] https://github.com/ubuntu/microk8s/issues/1127#issuecomment-629651489
@ktsakalozos thanks for explaining.
Unfotunately, I did not run microk8s.inspect
back then and I had already purged and reinstalled microk8s and I can not reproduce the problem. I guess, I was trying to do microk8s.reset
which was stuck and I had to kill it, so my cluster probably was in an unknown state. I repeated the join process and it worked. Consider my issue solved please.
Another update from my side. DNS failed indeed. I noticed that unlike non-HA cluster, the HA cluster uses hostnames as node names and then the DNS resolver fails because it tries to resolve it with 1.1.1.1. (for example when a pod tries to call a healthz endpoint on another node). Previously, it used to work because the application pod would use node IP to call the endpoints.
This time I did a microk8s.inspect
on the master. For any reason, master died afterwards. I have attached the tarball.
inspection-report-20200623_225158.tar.gz
Hi, say i have a 3 node ha cluster, 2 of them went down for some time, one after the other. What happens to the cluster? What i noticed
kubernetes reports NotReady
on the node that went down. This time only one node goes down.
Pods previously scheduled on that node are not rescheduled on to other nodes. I didn't change the any configuration. Waited more than 5 minutes.
When the second node goes down, i can't connect to the cluster anymore.
Someone experience this? Thanks.
Hi, say i have a 3 node ha cluster, 2 of them went down for some time, one after the other. What happens to the cluster? What i noticed
* kubernetes reports `NotReady` on the node that went down. This time only one node goes down. * Pods previously scheduled on that node are not rescheduled on to other nodes. I didn't change the any configuration. Waited more than 5 minutes. * When the second node goes down, i can't connect to the cluster anymore.
I'm not sure about the first 2, but the third is normal. You need a quorum of voting nodes to be up in order for the cluster to be operational. So if you have 3 nodes, all of them will be voting (since the minimum is 3) and you'll need at least 2 of them to be up.
Thanks @freeekanayaka. That makes sense.
@balchua, just reproduced the scenario you mention, in three nodes (u1, u2, u3) I removed u3 and saw the dns pod re-scheduled to another pod after the 5 min period.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/calico-kube-controllers-555fc8cc5c-77zjp 1/1 Running 0 77m 10.1.215.193 u1 <none> <none>
kube-system pod/calico-node-9w59f 1/1 Running 0 75m 10.228.72.103 u2 <none> <none>
kube-system pod/calico-node-c4g7l 1/1 Running 0 77m 10.228.72.34 u1 <none> <none>
kube-system pod/calico-node-96fxt 1/1 Running 0 73m 10.228.72.45 u3 <none> <none>
kube-system pod/coredns-588fd544bf-8t94g 1/1 Terminating 0 71m 10.1.43.1 u3 <none> <none>
kube-system pod/coredns-588fd544bf-9cpcd 1/1 Running 0 63m 10.1.108.65 u2 <none> <none>
What pod remained unscheduled in your case?
The calico kube controller wasn't rescheduled to a different node. It stays as running.
What else can you share about this incident @balchua? Do you have a microk8s.inspect
tarball? In the ran below you can see the calico controller re-scheduled.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/calico-node-df2bf 1/1 Running 1 178m 10.228.72.103 u2 <none> <none>
kube-system pod/calico-node-2nct7 1/1 Running 1 175m 10.228.72.45 u3 <none> <none>
kube-system pod/coredns-588fd544bf-vxfcv 1/1 Running 1 174m 10.1.108.66 u2 <none> <none>
kube-system pod/calico-node-9flr7 1/1 Running 1 3h 10.228.72.34 u1 <none> <none>
kube-system pod/calico-kube-controllers-555fc8cc5c-vv729 1/1 Terminating 1 3h 10.1.215.194 u1 <none> <none>
kube-system pod/calico-kube-controllers-555fc8cc5c-9tfk8 1/1 Running 0 5m1s 10.1.43.1 u3 <none> <none>
Its unfortunate that i didnt get the inspect tarball. I will test it more this weekend.
I'm not sure if I should start a new bug or just put this here since ha is in preview currently. I'll start here and if you prefer a new bug I can do that.
A fresh install on bionic KVMs with this branch is not working, possibly because this environment uses a proxy.
The setup is 4 KVM's
sudo snap install microk8s --channel=latest/edge/ha-preview --classic
# Enable proxy
sudo vim /var/snap/microk8s/current/args/containerd-env
sudo microk8s stop && sudo microk8s start
# Enable addons (on initial node only)
sudo microk8s enable dns
# Add nodes
sudo microk8s add-node
The nodes add but calico pods never show ready Example pod: https://pastebin.ubuntu.com/p/GSWx58Qv3p/
kube-proxy seems to be failing to start journalctl: https://pastebin.ubuntu.com/p/7HqSDjTBdW/
Here is an inspection report as well: inspection-report-20200720_145823.tar.gz
I tried just installing latest/stable (1.18) and notice that kube-proxy is erroring the same way. Purging and installing 1.17 has kube-proxy starting normally. I'm not clear if kube-proxy, 1.18, or ha-preview are the root cause here.
Scratch that, I've got a typo in my bundle and have some of these snaps on LXD rather than KVM. I'll do a clean re-deploy and test from the top.
Using all KVM's this time went smoother, and kube-proxy is running all of the nodes now. I am still having failures on both hostprovisioner and the calico controller. Both fail with this same message.
Warning FailedCreatePodSandBox 54s kubelet, juju-f5fb96-13-kvm-0 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "eb083a2378d08a5945df088d6d6a1cf5c0d32a8a11bf57556e1d8b08fb61db76": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: Service Unavailable
Any idea what's going on with that?
I believe I have it working now, for future travelers and possibly for inclusion in the docs. It appears the API calls are picking up the containerd proxy setting from /var/snap/microk8s/current/args/containerd-env
.
Fortunately, Go seems to accept CIDR notation for NO_PROXY so I added a NO_PROXY line along with the HTTPS_PROXY and after a restart calico is finally happy.
I'm using: NO_PROXY=10.0.8.0/23,10.1.0.0/16,10.152.0.0/16
though your environment may vary.
Hi @chris-sanders
Thank you for getting to the bottom of this. The 10.1.0.0/16 is the cluster-cidr [1]. The 10.152.0.0/16 includes the services-cidr [2]. I am not sure where 10.0.8.0/23 comes from.
Our doc on setting up the cluster behind a proxy [3] is not great and expects a certain degree of expertise. Any suggestions and improvements are much appreciated.
I wonder if we could improve the microk8s inspect
script [4] to detect the presence of a PROXY configuration on the host and offer proper guidance.
[1] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/microk8s-resources/default-args/kube-proxy#L2 [2] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/microk8s-resources/default-args/kube-apiserver#L2 [3] https://microk8s.io/docs/install-proxy [4] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/scripts/inspect.sh#L106
@ktsakalozos I don't appear to be able to update or recommend document changes, or I don't know how to via discourse.
The 10.0.8.0/23 space is unique to my environment it's the ip addresses for the machines. I'm not sure if that is necessary, but I knew it never needed a proxy so I included it.
My recommendation for the docs is to mention NO_PROXY in the install-proxy docs, and include an example NO_PROXY with 10.1.0.0/16 and 10.152.0.0/16 commented out like the HTTPS_PROXY setting. Had that been in the comments for the containerd-env I would have probably just done the right thing from the beginning.
Hi @ktsakalozos i noticed that in the HA preview running microk8s status
shows the long format.
Better give a heads up if the plan is to make that the default.
I remember we rolled back that due to some users script rely on the short form.
See issue https://github.com/ubuntu/microk8s/issues/1041
Another observation here. Scenario 1: abrupt shutdown of a voting node. 4 node cluster. MicroK8s reports this.
microk8s is running
high-availability: yes
datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.7:19001
datastore standby nodes: 10.3.0.4:19001
I shutdown the node 10.3.0.7
. Then MicroK8s reports this.
microk8s is running
high-availability: yes
datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.4:19001
datastore standby nodes: none
All good. The standby node became voter node.
Scenario 2: voter node leaves the cluster on purpose.
Still 4 node cluster. Initial state of the cluster.
microk8s is running
high-availability: yes
datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.7:19001
datastore standby nodes: 10.3.0.4:19001
Purposely leave the node 10.3.0.7
from the cluster. I.e. microk8s leave
followed by microk8s remove node . . .
Then MicroK8s reported this.
microk8s is running
high-availability: no
datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001
datastore standby nodes: 10.3.0.4:19001
I notice that node 10.3.0.4
didn't become part of the voting nodes.
Has anyone observe such behavior?
Although i can still access the cluster. Operations such as kubectl get nodes
shows the right nodes available in the cluster.
@balchua I cannot reproduce scenario 2. Is this scenario reproducible on your side? Some time is needed (a few seconds) before a standby node is promoted to a voter (and appear in the datastore master node
list) but I trust you gave it enough time to see what happens. Do you have the inspection tarball?
@freeekanayaka might have some ideas on what may be happening and/or what information we need to gather debug this behavior.
@ktsakalozos I don't have the inspect tarball anymore. Sorry. I eventually join another node and forced it to go offline ungracefully. This made the standby node became a voter. Yes i did wait for sometime, probably around 5 mins.
As far as I can tell from the code, if at any point in time you have 2 online voters and 1 online stand-by, then that stand-by should get promoted to voter within 30 seconds.
There shouldn't be race conditions involved here, so I'm not sure why @ktsakalozos was not able to reproduce it. Perhaps there is some piece missing? @balchua if you could try to reproduce it on your end and see if the bug triggers, that'd be nice.
Hello, I cannot find any documentation how to configure the high Availability feature taht Canonical has announced for microk8s. Thank you,