High Availability - Githubissues

ednaganon commented 4 years ago

Hello, I cannot find any documentation how to configure the high Availability feature taht Canonical has announced for microk8s. Thank you,

ktsakalozos commented 4 years ago

Hi @ednaganon

You can get an HA with dqlite cluster by deploying MicroK8s with:

sudo snap install microk8s --classic --channel=edge/test-dqlite

We are now working on ways to transition the already existing deployments to a HA-dqlite setup.

EDIT: The edge/test-dqlite channel is not available any more. Use the latest/edge/ha-preview instead.

bmreading commented 4 years ago

Besides transitioning existing deployments, how can we create new deployments of microk8s with HA? I understand this would be the method to install, but how can it be configured? Specifically, I'm thinking that I would be able to have multiple master nodes?

ktsakalozos commented 4 years ago

@RootHouston if you join three or more nodes from the --channel=edge/test-dqlite they will form an HA cluster.

philkry commented 4 years ago

@ktsakalozos i tried to give it a shot but the branch does not seem to be available?

$ sudo snap install microk8s --classic --channel=edge/test-dqlite
error: requested a non-existing branch on latest/edge for snap "microk8s": test-dqlite

ktsakalozos commented 4 years ago

@philkry, we are working on the next iteration of the ha work that would allow the transition of a non-ha cluster to ha. I will ping you again in this thread hopefully within the week to show you how this work progresses. Thank you for your interest, we appreciate your feedback.

nojakub commented 4 years ago

@ktsakalozos Is it already possible to create fresh N node cluster with HA enabled? Which channel should be used?

ktsakalozos commented 4 years ago

To get an early glimpse of HA deploy microk8s with:

sudo snap install microk8s --classic --channel=latest/edge/ha-preview

Enabling ha-cluster is required on all nodes joining/forming an HA cluster.

microk8s enable ha-cluster

The joining process has not changed. You need to microk8s add-node to produce a connection string and use that string to join a node with microk8s join.

Your cluster needs to have at least three nodes. To check the status of HA you can:

microk8s status ha-cluster

To remove a node, first do a microk8s leave on the departing node and then a microk8s remove-node on any of the nodes of the cluster.

@philkry @nojakub @balchua @iskitsas @knkski @tvansteenburgh really interested in your feedback on this preview.

philkry commented 4 years ago

@ktsakalozos thanks for the info! i quickly spun up 3 nodes, cluster assembly worked as expected. control plane stays available if i kill one node and that node successfully re-joins the cluster when it becomes available again :) i noticed one thing where i'm not sure it is expected behaviour or not: it takes quite some time (not measured but i would guesstimate 1minute +) until the two remaining nodes report the unavailable node as 'NotReady'. when i just issue a reboot, the node reboots fairly quickly and never gets reported NotReady at all even tough i see dqlite reporting connection issues

May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:04+02:00" level=warning msg="dqlite: server unavailable err=failed to establish network connection: connect to HTTP endpoint: connect to server: dial tcp 10.0.0.3:19001: i/o timeout address=10.0.0.3:19001 attempt=0"
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:04 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:04+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:34030->10.0.0.2:19001: i/o timeout address=10.0.0.2:19001 attempt=0"
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:05+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:52412->10.0.0.4:19001: i/o timeout address=10.0.0.4:19001 attempt=0"
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:05 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.323068776+02:00" level=info msg="ExecSync for "14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d" with command [/bin/calico-node -felix-live] and timeout 1 (s)"
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.549408484+02:00" level=info msg="Finish piping "stdout" of container exec "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec""
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.549473863+02:00" level=info msg="Finish piping "stderr" of container exec "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec""
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.553301009+02:00" level=info msg="Exec process "6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec" exits with exit code 0 and error <nil>"
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.613600770+02:00" level=info msg="TaskExit event &TaskExit{ContainerID:14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d,ID:6723c84025ddc589e8ea478e5ef73112680faa1115febe69b088e35964afb0ec,Pid:22016,ExitStatus:0,ExitedAt:2020
May 17 18:54:07 node2 microk8s.daemon-containerd[20254]: time="2020-05-17T18:54:07.615017697+02:00" level=info msg="ExecSync for "14accf216d4deb59f4770ae2d9a7aac3fe9e6c816a9045f42317a71c6920816d" returns with exit code 0"
May 17 18:54:08 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:08 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to establish network connection: connect to HTTP endpoint: connect to server: dial tcp 10.0.0.3:19001: i/o timeout address=10.0.0.3:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy TLS -> Unix: local error: tls: bad record MAC
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Dqlite proxy Unix -> TLS: read unix @->@0005f: use of closed network connection
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:34050->10.0.0.2:19001: i/o timeout address=10.0.0.2:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: time="2020-05-17T18:54:10+02:00" level=warning msg="dqlite: server unavailable err=failed to send handshake: write tcp 10.0.0.2:52432->10.0.0.4:19001: i/o timeout address=10.0.0.4:19001 attempt=0"
May 17 18:54:10 node2 microk8s.daemon-apiserver[20248]: Failed to get servers: no available dqlite leader server found

Every 1.0s: microk8s kubectl get nodes                                                                                                                node1: Sun May 17 18:56:29 2020

NAME    STATUS   ROLES    AGE   VERSION
node1   Ready    <none>   94m   v1.18.2-2+3a7b7d1884aa18
node3   Ready    <none>   88m   v1.18.2-2+3a7b7d1884aa18
node2   Ready    <none>   90m   v1.18.2-2+3a7b7d1884aa18

only if i shut the node completely down and wait, it eventually shows up as NotReady. i will test it more thoroughly with some workloads later..

ktsakalozos commented 4 years ago

@philkry thank you for your feedback. Can you provide more information on how you kill the node? In my tests the departing node is marked as NotReady almost immediately but the pods do not get re-scheduled until much later. This may have to do with the default five minute eviction delay [1].

Also, @freeekanayaka correctly if I am wrong, if you kill (not cleanly remove) the node acting as the leader in dqlite the remaining two nodes may not be able to elect a leader.

[1] https://kubernetes.io/docs/concepts/architecture/nodes/#node-status

freeekanayaka commented 4 years ago

Also, @freeekanayaka correctly if I am wrong, if you kill (not cleanly remove) the node acting as the leader in dqlite the remaining two nodes may not be able to elect a leader.

I can't speak for the NotReady state as that's something that k8s controls, for which I don't know the exact logic.

Regarding the dqlite database, if you have at least 3 nodes (i.e. you are HA), and you gracefully shutdown the current leader (with systemctl stop snap.microk8s.daemon-apiserver or any equivalent way to send a TERM signal to the apiserver process) then the leadership is immediately transferred to one of the two other nodes, and database downtime is minimal (it just requires the client code to reconnect). On the other hand if the current leader dies abruptly (with KILL signal or a machine power loss) the 2 remaining nodes will take up 5 or 10 seconds to realize that the leader has die and run a new election to decide who the new leader will be. During that time client code will receive errors, but it should work again after that.

philkry commented 4 years ago

@ktsakalozos what i did was just rebooting the the node via commandline. i did not cleanly remove it from the cluster. do i always have to drain the node properly to keep dqlite working? how will HA work, if one node (out of 3) dies due to h/w failure? would this scenario not be covered at all?

edit: @freeekanayaka replied at the same time i did, so i guess my question is answered.

knkski commented 4 years ago

@ktsakalozos: I've got a cluster deployed with 10 MicroK8s nodes from the stable channel. Will I have to do anything special to try out HA beyond sudo snap switch microk8s --channel=edge/ha-preview && sudo snap refresh && microk8s enable ha-cluster?

ktsakalozos commented 4 years ago

@knkski in general switching across channels is not recommended because the services configuration may have diverged. In this case the latest/stable and the ha-preview should be pretty close.

knkski commented 4 years ago

@ktsakalozos: I removed the nodes from the cluster after getting a message about having to remove and rejoin them after enabling HA. I then tried to join a node to the cluster and got this error message:

$ microk8s join 1.2.3.4:25000/...
Failed to join cluster. Error code 500.

Checking on the master node, I saw this traceback:

Waiting for access to cluster. [2020-05-19 13:38:39,074] ERROR in app: Exception on /cluster/api/v2.0/join [POST]
Traceback (most recent call last):
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/snap/microk8s/1415/lib/python3.5/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/snap/microk8s/1415/scripts/cluster/agent.py", line 557, in join_node_dqlite
    voters = get_dqlite_voters()  # type: List[str]
  File "/snap/microk8s/1415/scripts/cluster/agent.py", line 452, in get_dqlite_voters
    with open("{}/info.yaml".format(cluster_dir)) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/microk8s/current/var/kubernetes/backend/info.yaml'

Not really sure how I triggered this, but purging the snap and reinstalling it worked.

It would also be nice to have slightly better messaging when joining a node. I see this:

Waiting for node to join the cluster. .. .. .. .. .. .. .. .. .. ..

Am I supposed to join a third node before it stops waiting? If I don't add a third node by the time it finishes waiting, has it failed to add the current node?

ktsakalozos commented 4 years ago

Thank you for giving it a try @knkski

To try and reproduce the error I will start from latest/stable refresh to edge/ha-cluster and re-join the nodes.

Joining one node at a time is preferable but since we are in the testing phase we could test the simultaneous join scenario.

Joining in HA takes more time that the non-ha case so the boring "Waiting for node to join the cluster" is displayed for a longer time. I will see if we have more details show.

nojakub commented 4 years ago

@ktsakalozos thank for the how to, I can spin ha-cluster with 3 nodes they are connecting to eachother and reporing Ready state. But when I inspect microk8s I get this:

root@k8s3:[/]# microk8s.inspect 
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
 FAIL:  Service snap.microk8s.daemon-flanneld is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-flanneld
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
 FAIL:  Service snap.microk8s.daemon-etcd is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-etcd
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

Should flannel and etcd report FAIL in this case?

ktsakalozos commented 4 years ago

Yes, etcd and flannel are not used in the HA setup. Calico and dqlite are replacing those. I am making a note to fix inspect so it does not report this as failure. Thank you @nojakub

ed1000 commented 4 years ago

@ktsakalozos I'm seeing some problems with certificates. It appears that the certificates from the "node 0" of the cluster are pushed to the other nodes. Whenever I try to access the cluster from any other node, I get: Unable to connect to the server: x509: certificate is valid for 127.0.0.1, 10.152.183.1, 172.31.65.17, 10.1.56.0, not 172.31.70.251

Is this normal? Or should a new certificate be generated that encompasses all the endpoints and then be spread to the other nodes?

ktsakalozos commented 4 years ago

EDIT: The following is not needed anymore. Server certs are now created with the IPs MicroK8s can detect on the node. However, you may still find the following useful in case there is for example an VIP leading to the node(s) and there is no interface with that IP on the node.

@ed1000 currently there is no way to redistribute certificates across nodes and I am not sure if this is the right approach.

Here is a workaround assuming you know which nodes will join the cluster. After you have microk8s enable ha-cluster on 'node 0' and before joining the nodes, edit the /var/snap/microk8s/current/certs/csr.conf.template file. In the [ alt_names ] section you can add the DNS entries and IPs of the nodes to be joined. For example I added the following in a test cluster:

DNS.6 = ec2-66-66-66-66.compute-1.amazonaws.com
DNS.7 = ec2-66-44-44-46.compute-1.amazonaws.com
DNS.8 = ec2-55-55-55-155.compute-1.amazonaws.com
IP.A = 22.22.222.22
IP.B = 33.33.3.33
IP.C = 44.4.244.144

Note that the #MOREIPS will be replaced with any local IPs MicroK8s could detect. Within 5 seconds MicroK8s will detect the change in the csr.conf.template file and re-issue the certificates. At this point you can join the rest of the nodes to form the cluster.

Changing and distributing the certificates after the initial cluster formation is something we need to improve but I do not have a good solution right now.

ktsakalozos commented 4 years ago

EDIT: The admin user will have the same token on all nodes.

~~As there is no central identity management each node has its own admin user.~~

~~If you want an admin user with the same credentials across all nodes you will have to create one by add to /var/snap/microk8s/current/credentials/known_tokens.csv something like:~~

enlcxdasdsadadsfdsafdasfdsaRQcz0K,haadmin,haadmin,"system:masters"

~~We will see if we can automate this, but for now this user creation is a manual step.~~

panManfredini commented 4 years ago

Hi,

I'm having some problems with calico (not exactly an issue with this repo).

Calico chooses the wrong interface, this seems to be a common issue and the solution seems to be to set the env variable IP_AUTODETECTION_METHOD to something different from default first-found.

I couldn't find a way to do it yet, the deployment calico-kube-controllers does not have a template of a calico-node and I'm a bit puzzled... Is there a way to work around it?

Thanks a lot!

---------------------------- EDIT----------------------------------------- It turns out that this problem is due to either some proxy or firewall in the net config of my institute... Please ignore my comment.

balchua commented 4 years ago

@panManfredini is this environment IP_AUTODETECTION_METHOD modifiable after calico-node starts running? Or it has to be done before installing calico-node? Sorry not very familiar with calico. 😁

nikkoura commented 4 years ago

Hi,

I'd like to test HA clustering on a 3-nodes RPi4 cluster, but it seems that the latest/edge/ha-preview branch is no longer opened (or maybe is not available for arm64). Is there another branch I could test?

Thanks a lot!

ktsakalozos commented 4 years ago

latest/edge/ha-preview branch is no longer opened (or maybe is not available for arm64).

@nikkoura, I just updated the launchpad builders to build snaps for arm64. Please give it a few hours for the branch to be populated. Thank you for giving this a try. Your feedback is much appreciated.

nikkoura commented 4 years ago

Thanks for the build. I have been able to use them to test them on my setup. So far, so good, except a small issue with coredns (#1292)

ktsakalozos commented 4 years ago

Thank you for reporting the coredns issue @nikkoura, the latest build should have this addressed.

philkry commented 4 years ago

@ktsakalozos is it possible to add a worker-only node to a 3 node HA cluster? i was expecting it to work by just adding a node without prior microk8s enable ha-cluster but i do get an error upon joining the node. is this not supported? my idea was to have a 3 node HA control pane and adding workers as needed.

mehdisadeghi commented 4 years ago

Today I spent sometime trying to enable ha on a 3 node microk8s cluster but failed with various problems.

The first problem is that on the HA channel, enabling DNS add-on fails on a fresh snap installation (before trying to enable HA). Doing kubectl explain coredns_pod shows dependency failure on calico.

The second problem is the HA itself..On the master I could enable HA. Then I got the message like others to remove and re-add other nodes:

# microk8s join 10.0.0.3:25000/someguid...
Failed to join cluster. Error code 501. Failed to join the cluster. This is an HA dqlite cluster. 
Please, retry after enabling HA on this joining node with 'microk8s enable ha-cluster'.

However I was unable to enalbe HA and join other nodes back into the cluster. They would get stuck in pending state with this failure ( so I had to press Ctrl-C and cancel it. Same on the other node):

# microk8s enable ha-cluster
No resources found in default namespace.
Enabling HA
Upgrading the network CNI
Waiting for the cluster to restart
Waiting for the CNI to deploy
daemon set "calico-node" successfully rolled out
Waiting for deployment "calico-kube-controllers" rollout to finish: 0 of 1 updated replicas are     available...
^CFailed to enable ha-cluster

And here is the inspeciton of the calico pod which were stuck at creation:

# microk8s.kubectl describe pod/calico-kube-controllers-555fc8cc5c-v7zx2 -n kube-system
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 <none>
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=555fc8cc5c
Annotations:          scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/calico-kube-controllers-555fc8cc5c
Containers:
  calico-kube-controllers:
    Image:      calico/kube-controllers:v3.13.2
    Port:       <none>
    Host Port:  <none>
    Readiness:  exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-84kxf (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  calico-kube-controllers-token-84kxf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-kube-controllers-token-84kxf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  <unknown>  default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  <unknown>  default-scheduler  no nodes available to schedule pods

@ktsakalozos it would be great if you could document the correct process to create an HC cluster. Thanks.

ktsakalozos commented 4 years ago

is it possible to add a worker-only node to a 3 node HA cluster?

@philkry, this is not possible. We are looking into ways to take into account any user preferences.

Why do you want a worker only node? I can thing of a few reasons, I just want to hear what your exact requirement is.

ktsakalozos commented 4 years ago

@mehdisadeghi thank you for trying out HA and providing feedback.

I would like some more info in order to understand what exactly happened and why.

enabling DNS add-on fails on a fresh snap installation (before trying to enable HA)

Can you show me what happens? Could you attach the tarball created by microk8s.inspect? Here is a test run I just did:

ubuntu@ip-172-31-24-191:~$ sudo snap remove microk8s                                                                                                                                                          microk8s removed                                                                                                                                                                                                     
ubuntu@ip-172-31-24-191:~$ sudo snap install microk8s --classic --channel=latest/edge/ha-preview                                                                                                                     
microk8s (edge/ha-preview) v1.18.4 from Canonical✓ installed                                                                                                                                                         
ubuntu@ip-172-31-24-191:~$ sudo microk8s.enable dns                                                                                                                                                                  
Enabling DNS                                                                                                                                                                                                         
Applying manifest                                                                                                                                                                                                    
serviceaccount/coredns created                                                                                                                                                                                       
configmap/coredns created                                                                                                                                                                                            
deployment.apps/coredns created                                                                                                                                                                                      
service/kube-dns created                                                                                                                                                                                             
clusterrole.rbac.authorization.k8s.io/coredns created                                                                                                                                                                
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
DNS is enabled
ubuntu@ip-172-31-24-191:~$ sudo microk8s.kubectl get all -A  -o wide
NAMESPACE     NAME                           READY   STATUS    RESTARTS   AGE   IP          NODE               NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-588fd544bf-vfxzd   1/1     Running   0          27s   10.1.95.2   ip-172-31-24-191   <none>           <none>

NAMESPACE     NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
default       service/kubernetes   ClusterIP   10.152.183.1    <none>        443/TCP                  2m33s   <none>
kube-system   service/kube-dns     ClusterIP   10.152.183.10   <none>        53/UDP,53/TCP,9153/TCP   27s     k8s-app=kube-dns

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                  SELECTOR
kube-system   deployment.apps/coredns   1/1     1            1           27s   coredns      coredns/coredns:1.6.6   k8s-app=kube-dns

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                  SELECTOR
kube-system   replicaset.apps/coredns-588fd544bf   1         1         1       27s   coredns      coredns/coredns:1.6.6   k8s-app=kube-dns,pod-template-hash=588fd544bf

On the second issue you reported:

They would get stuck in pending state with this failure ( so I had to press Ctrl-C and cancel it. Same on the other node)

This indicates that the pod scheduling of calico is failing. There are a few reasons why that may be. Again, could you attach the tarball created by microk8s.inspect?

On documenting the HA setup process the only description for now is the comment [1]. We are still exploring how to improve the UI and we are open to any suggestions.

[1] https://github.com/ubuntu/microk8s/issues/1127#issuecomment-629651489

mehdisadeghi commented 4 years ago

@ktsakalozos thanks for explaining.

Unfotunately, I did not run microk8s.inspect back then and I had already purged and reinstalled microk8s and I can not reproduce the problem. I guess, I was trying to do microk8s.reset which was stuck and I had to kill it, so my cluster probably was in an unknown state. I repeated the join process and it worked. Consider my issue solved please.

mehdisadeghi commented 4 years ago

Another update from my side. DNS failed indeed. I noticed that unlike non-HA cluster, the HA cluster uses hostnames as node names and then the DNS resolver fails because it tries to resolve it with 1.1.1.1. (for example when a pod tries to call a healthz endpoint on another node). Previously, it used to work because the application pod would use node IP to call the endpoints.

This time I did a microk8s.inspect on the master. For any reason, master died afterwards. I have attached the tarball. inspection-report-20200623_225158.tar.gz

balchua commented 4 years ago

Hi, say i have a 3 node ha cluster, 2 of them went down for some time, one after the other. What happens to the cluster? What i noticed

kubernetes reports NotReady on the node that went down. This time only one node goes down.
Pods previously scheduled on that node are not rescheduled on to other nodes. I didn't change the any configuration. Waited more than 5 minutes.
When the second node goes down, i can't connect to the cluster anymore.

Someone experience this? Thanks.

freeekanayaka commented 4 years ago

Hi, say i have a 3 node ha cluster, 2 of them went down for some time, one after the other. What happens to the cluster? What i noticed

* kubernetes reports `NotReady` on the node that went down.  This time only one node goes down.

* Pods previously scheduled on that node are not rescheduled on to other nodes.  I didn't change the any configuration.  Waited more than 5 minutes.

* When the second node goes down, i can't connect to the cluster anymore.

I'm not sure about the first 2, but the third is normal. You need a quorum of voting nodes to be up in order for the cluster to be operational. So if you have 3 nodes, all of them will be voting (since the minimum is 3) and you'll need at least 2 of them to be up.

balchua commented 4 years ago

Thanks @freeekanayaka. That makes sense.

ktsakalozos commented 4 years ago

@balchua, just reproduced the scenario you mention, in three nodes (u1, u2, u3) I removed u3 and saw the dns pod re-scheduled to another pod after the 5 min period.

NAMESPACE     NAME                                           READY   STATUS        RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
kube-system   pod/calico-kube-controllers-555fc8cc5c-77zjp   1/1     Running       0          77m   10.1.215.193    u1     <none>           <none>
kube-system   pod/calico-node-9w59f                          1/1     Running       0          75m   10.228.72.103   u2     <none>           <none>
kube-system   pod/calico-node-c4g7l                          1/1     Running       0          77m   10.228.72.34    u1     <none>           <none>
kube-system   pod/calico-node-96fxt                          1/1     Running       0          73m   10.228.72.45    u3     <none>           <none>
kube-system   pod/coredns-588fd544bf-8t94g                   1/1     Terminating   0          71m   10.1.43.1       u3     <none>           <none>
kube-system   pod/coredns-588fd544bf-9cpcd                   1/1     Running       0          63m   10.1.108.65     u2     <none>           <none>

What pod remained unscheduled in your case?

balchua commented 4 years ago

The calico kube controller wasn't rescheduled to a different node. It stays as running.

ktsakalozos commented 4 years ago

What else can you share about this incident @balchua? Do you have a microk8s.inspect tarball? In the ran below you can see the calico controller re-scheduled.

NAMESPACE     NAME                                           READY   STATUS        RESTARTS   AGE    IP              NODE   NOMINATED NODE   READINESS GATES
kube-system   pod/calico-node-df2bf                          1/1     Running       1          178m   10.228.72.103   u2     <none>           <none>
kube-system   pod/calico-node-2nct7                          1/1     Running       1          175m   10.228.72.45    u3     <none>           <none>
kube-system   pod/coredns-588fd544bf-vxfcv                   1/1     Running       1          174m   10.1.108.66     u2     <none>           <none>
kube-system   pod/calico-node-9flr7                          1/1     Running       1          3h     10.228.72.34    u1     <none>           <none>
kube-system   pod/calico-kube-controllers-555fc8cc5c-vv729   1/1     Terminating   1          3h     10.1.215.194    u1     <none>           <none>
kube-system   pod/calico-kube-controllers-555fc8cc5c-9tfk8   1/1     Running       0          5m1s   10.1.43.1       u3     <none>           <none>

balchua commented 4 years ago

Its unfortunate that i didnt get the inspect tarball. I will test it more this weekend.

chris-sanders commented 4 years ago

I'm not sure if I should start a new bug or just put this here since ha is in preview currently. I'll start here and if you prefer a new bug I can do that.

A fresh install on bionic KVMs with this branch is not working, possibly because this environment uses a proxy.

The setup is 4 KVM's

sudo snap install microk8s --channel=latest/edge/ha-preview --classic

# Enable proxy
sudo vim /var/snap/microk8s/current/args/containerd-env
sudo microk8s stop && sudo microk8s start

# Enable addons (on initial node only)
sudo microk8s enable dns

# Add nodes
sudo microk8s add-node

The nodes add but calico pods never show ready Example pod: https://pastebin.ubuntu.com/p/GSWx58Qv3p/

kube-proxy seems to be failing to start journalctl: https://pastebin.ubuntu.com/p/7HqSDjTBdW/

Here is an inspection report as well: inspection-report-20200720_145823.tar.gz

I tried just installing latest/stable (1.18) and notice that kube-proxy is erroring the same way. Purging and installing 1.17 has kube-proxy starting normally. I'm not clear if kube-proxy, 1.18, or ha-preview are the root cause here.

chris-sanders commented 4 years ago

Scratch that, I've got a typo in my bundle and have some of these snaps on LXD rather than KVM. I'll do a clean re-deploy and test from the top.

chris-sanders commented 4 years ago

Using all KVM's this time went smoother, and kube-proxy is running all of the nodes now. I am still having failures on both hostprovisioner and the calico controller. Both fail with this same message.

Warning FailedCreatePodSandBox 54s kubelet, juju-f5fb96-13-kvm-0 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "eb083a2378d08a5945df088d6d6a1cf5c0d32a8a11bf57556e1d8b08fb61db76": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: Service Unavailable

Any idea what's going on with that?

chris-sanders commented 4 years ago

I believe I have it working now, for future travelers and possibly for inclusion in the docs. It appears the API calls are picking up the containerd proxy setting from /var/snap/microk8s/current/args/containerd-env.

Fortunately, Go seems to accept CIDR notation for NO_PROXY so I added a NO_PROXY line along with the HTTPS_PROXY and after a restart calico is finally happy.

I'm using: NO_PROXY=10.0.8.0/23,10.1.0.0/16,10.152.0.0/16 though your environment may vary.

ktsakalozos commented 4 years ago

Hi @chris-sanders

Thank you for getting to the bottom of this. The 10.1.0.0/16 is the cluster-cidr [1]. The 10.152.0.0/16 includes the services-cidr [2]. I am not sure where 10.0.8.0/23 comes from.

Our doc on setting up the cluster behind a proxy [3] is not great and expects a certain degree of expertise. Any suggestions and improvements are much appreciated.

I wonder if we could improve the microk8s inspect script [4] to detect the presence of a PROXY configuration on the host and offer proper guidance.

[1] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/microk8s-resources/default-args/kube-proxy#L2 [2] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/microk8s-resources/default-args/kube-apiserver#L2 [3] https://microk8s.io/docs/install-proxy [4] https://github.com/ubuntu/microk8s/blob/feature/ha-enable/scripts/inspect.sh#L106

chris-sanders commented 4 years ago

@ktsakalozos I don't appear to be able to update or recommend document changes, or I don't know how to via discourse.

The 10.0.8.0/23 space is unique to my environment it's the ip addresses for the machines. I'm not sure if that is necessary, but I knew it never needed a proxy so I included it.

My recommendation for the docs is to mention NO_PROXY in the install-proxy docs, and include an example NO_PROXY with 10.1.0.0/16 and 10.152.0.0/16 commented out like the HTTPS_PROXY setting. Had that been in the comments for the containerd-env I would have probably just done the right thing from the beginning.

balchua commented 4 years ago

Hi @ktsakalozos i noticed that in the HA preview running microk8s status shows the long format. Better give a heads up if the plan is to make that the default. I remember we rolled back that due to some users script rely on the short form. See issue https://github.com/ubuntu/microk8s/issues/1041

balchua commented 4 years ago

Another observation here. Scenario 1: abrupt shutdown of a voting node. 4 node cluster. MicroK8s reports this.

microk8s is running
high-availability: yes
  datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.7:19001
  datastore standby nodes: 10.3.0.4:19001

I shutdown the node 10.3.0.7. Then MicroK8s reports this.

microk8s is running
high-availability: yes
  datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.4:19001
  datastore standby nodes: none

All good. The standby node became voter node.

Scenario 2: voter node leaves the cluster on purpose.

Still 4 node cluster. Initial state of the cluster.

microk8s is running
high-availability: yes
  datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001 10.3.0.7:19001
  datastore standby nodes: 10.3.0.4:19001

Purposely leave the node 10.3.0.7 from the cluster. I.e. microk8s leave followed by microk8s remove node . . .

Then MicroK8s reported this.

microk8s is running
high-availability: no
  datastore master nodes: 10.3.0.2:19001 10.3.0.5:19001
  datastore standby nodes: 10.3.0.4:19001

I notice that node 10.3.0.4 didn't become part of the voting nodes. Has anyone observe such behavior? Although i can still access the cluster. Operations such as kubectl get nodes shows the right nodes available in the cluster.

ktsakalozos commented 4 years ago

@balchua I cannot reproduce scenario 2. Is this scenario reproducible on your side? Some time is needed (a few seconds) before a standby node is promoted to a voter (and appear in the datastore master node list) but I trust you gave it enough time to see what happens. Do you have the inspection tarball?

@freeekanayaka might have some ideas on what may be happening and/or what information we need to gather debug this behavior.

balchua commented 4 years ago

@ktsakalozos I don't have the inspect tarball anymore. Sorry. I eventually join another node and forced it to go offline ungracefully. This made the standby node became a voter. Yes i did wait for sometime, probably around 5 mins.

freeekanayaka commented 4 years ago

As far as I can tell from the code, if at any point in time you have 2 online voters and 1 online stand-by, then that stand-by should get promoted to voter within 30 seconds.

There shouldn't be race conditions involved here, so I'm not sure why @ktsakalozos was not able to reproduce it. Perhaps there is some piece missing? @balchua if you could try to reproduce it on your end and see if the bug triggers, that'd be nice.

canonical / microk8s

High Availability #1127