Closed geerlingguy closed 6 years ago
Ooh issue #100. Nice and even.
this is possible.
Since then, I have had no free time or enough unused Raspberry Pis to get it all working and set up. However, there are many posts online of people getting this working. You may want to check out these projects in no official order:
Hmm, to install Docker on Raspbian, you currently have to use the 'convenience script'; see: https://docs.docker.com/install/linux/docker-ce/debian/#install-using-the-convenience-script
Got Kubernetes installed, but kubelet startup is failing with:
May 23 10:44:43 kube1.pidramble.com kubelet[920]: unexpected fault address 0x15689500
May 23 10:44:43 kube1.pidramble.com kubelet[920]: fatal error: fault
May 23 10:44:43 kube1.pidramble.com kubelet[920]: [signal SIGSEGV: segmentation violation code=0x2 addr=0x15689500 pc=0x15689500]
May 23 10:44:43 kube1.pidramble.com kubelet[920]: goroutine 1 [running, locked to thread]:
May 23 10:44:43 kube1.pidramble.com kubelet[920]: runtime.throw(0x2a84a9e, 0x5)
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /usr/local/go/src/runtime/panic.go:605 +0x70 fp=0x15e2be98 sp=0x15e2be8c pc=0x3efa4
May 23 10:44:43 kube1.pidramble.com kubelet[920]: runtime.sigpanic()
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /usr/local/go/src/runtime/signal_unix.go:374 +0x1cc fp=0x15e2bebc sp=0x15e2be98 pc=0x5517c
May 23 10:44:43 kube1.pidramble.com kubelet[920]: k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.SemVer.Empty(...)
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:68
May 23 10:44:43 kube1.pidramble.com kubelet[920]: k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.NewSemVer(0x15816ec0, 0x20945b4, 0x2a8fbcf, 0xb, 0x15a76870)
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:41 +0x90 fp=0x
May 23 10:44:43 kube1.pidramble.com kubelet[920]: goroutine 5 [chan receive]:
May 23 10:44:43 kube1.pidramble.com kubelet[920]: k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x4551f48)
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:879 +0x70
May 23 10:44:43 kube1.pidramble.com kubelet[920]: created by k8s.io/kubernetes/vendor/github.com/golang/glog.init.0
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:410 +0x1a0
May 23 10:44:43 kube1.pidramble.com kubelet[920]: goroutine 69 [syscall]:
May 23 10:44:43 kube1.pidramble.com kubelet[920]: os/signal.signal_recv(0x2bd146c)
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /usr/local/go/src/runtime/sigqueue.go:131 +0x134
May 23 10:44:43 kube1.pidramble.com kubelet[920]: os/signal.loop()
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /usr/local/go/src/os/signal/signal_unix.go:22 +0x14
May 23 10:44:43 kube1.pidramble.com kubelet[920]: created by os/signal.init.0
May 23 10:44:43 kube1.pidramble.com kubelet[920]: /usr/local/go/src/os/signal/signal_unix.go:28 +0x30
May 23 10:44:43 kube1.pidramble.com systemd[1]: kubelet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 23 10:44:43 kube1.pidramble.com systemd[1]: kubelet.service: Unit entered failed state.
May 23 10:44:43 kube1.pidramble.com systemd[1]: kubelet.service: Failed with result 'exit-code'.
Manually running the following command results in the same thing:
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt --cadvisor-port=0 --rotate-certificates=true --cert-dir=/var/lib/kubelet/pki
Link to line of code where it looks like there's an empty version being passed, causing this error: https://github.com/appc/spec/blob/master/schema/types/semver.go#L68
When manually running kubeadm init ...
:
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03
[WARNING KubeletVersion]: couldn't get kubelet version: exit status 2
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
Got through that by running sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=PI_IP_HERE --ignore-preflight-errors=all
(Following this setup guide, which seems to be about what my roles do anyways... https://gist.github.com/alexellis/fdbc90de7691a1b9edb545c17da2d975).
And later in the init process:
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
...
Trying out an older version as per https://gist.github.com/alexellis/fdbc90de7691a1b9edb545c17da2d975#gistcomment-2595025
Might open an upstream issue, as it looks like nobody else has yet...
Here are all the steps I've taken: https://gist.github.com/alexellis/fdbc90de7691a1b9edb545c17da2d975#gistcomment-2598596
Still not able to get fully functional K8s cluster though; I can't get the Flannel networking up and running, and there are plenty of errors from kubelet seen via journalctl -f
.
Disabled the firewall for now, and the only errors I'm seeing from kubelet are now:
May 23 22:10:08 kube1.pidramble.com kubelet[6843]: W0523 22:10:08.713818 6843 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
May 23 22:10:08 kube1.pidramble.com kubelet[6843]: E0523 22:10:08.714480 6843 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
And the main node is reporting 'NotReady' for it's status currently:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube1.pidramble.com NotReady <none> 19m v1.10.2
So a quick fix for that is doing the following (from this comment: https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290235245):
Edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Comment the existing KUBELET_NETWORK_ARGS
line and add a line right below it:
Environment="KUBELET_NETWORK_ARGS="
Reload config and restart kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
However, later in that same issue, this summary (https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-317501985), it says you need to install a pod network... which I did using Flannel, but it seems that might not be working on the Pi like it did in my local Vagrant test rig :(
Testing with a 'hello world' worked!
kubectl run pi --image=perl --restart=OnFailure -- perl -Mbignum=bpi -wle 'print bpi(2)'
Then after a couple minutes (and a lot of Pi CPU usage and iowait! I'm guessing running things off a not-microSD-card would be waaaaay faster):
# kubectl describe jobs/pi
Name: pi
Namespace: default
Selector: controller-uid=8e3ca95c-5f6b-11e8-b8e4-b827ebcd0930
Labels: run=pi
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Thu, 24 May 2018 16:00:21 +0000
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=8e3ca95c-5f6b-11e8-b8e4-b827ebcd0930
job-name=pi
run=pi
Containers:
pi:
Image: perl
Port: <none>
Host Port: <none>
Args:
perl
-Mbignum=bpi
-wle
print bpi(2)
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 3m job-controller Created pod: pi-vzf8k
# kubectl logs pi-vzf8k
3.1
Also getting a lot of:
May 24 16:09:42 kube1.pidramble.com kernel: Under-voltage detected! (0x00050005)
So I might want to switch to a dedicated 2.4A power supply (right now I'm using my multi-port PowerAdd supply... which has been pretty stable normally, but might not be able to supply the full brunt needed to run the Pi 3 B+ under heavy CPU and I/O load!
Things seem stable after a restart as well; though every time I reboot it takes a long time (~5 minutes) before the master node reports Ready
status, and while that's happening all the kubelet requests (as well as things like kubectl get nodes
) fail with:
Unable to connect to the server: net/http: TLS handshake timeout
See related: https://github.com/Azure/AKS/issues/112 (though that could be something entirely different... it seems to happen after major changes or upgrades, so basically it seems Kubernetes' plumbing has to rejigger all the TLS stuff on any reboot or upgrade maybe.
Yay!
Single node cluster for now (one master)... I'll do a few quick benchmarks, then later today or this week I'll look into adding a few more of my nodes. Right now I just wanted to get it all reproducible and runnable!
Dug up an upstream issue: https://github.com/ansible/ansible/issues/40684
Using benchmarks from this page: http://www.pidramble.com/wiki/benchmarks/drupal
(Compare this to the Single Pi benchmarks, which I just ran with the old Drupal Pi stack a couple months ago—149 and 15.32 req/s, respectively.)
So not bad... it's almost 50% overhead to run on a single node with Kubernetes master on that node. I want to see what happens if I have a 2nd node, 3rd node, etc, along with some horizontal pod autoscaling. I'll be working on that next, but going to pause for a bit since I have a 100% functional setup at this point (see the kubernetes
branch in this repo, or PR #102, for current progress).
Update: Tested with 5 node cluster, Drupal pod on kube2, MySQL pod on kube5. Note that I have MySQL's PV affinity set to stick to kube5, but technically I could scale Drupal using a replicaset... however, the official library drupal container is not currently configured for running a live site correctly. I'll need to work on building a proper Drupal image + codebase as part of the overall playbook first...
Here are the results with kube1 being master (no pods), kube2 running Drupal, kube5 running MySQL:
Weird that anonymous was so much lower. I wonder if Flannel networking could be causing a bit of an issue somehow with latency? I'm hitting the server IP directly and accessing using the NodePort, so it's not even communicating across hosts...
Ha, now I'm getting errors because one of Raspbian's mirrors is down:
TASK [Ensure dependencies are installed.] ******************************************************************************
failed: [10.0.100.44] (item=[u'sudo', u'openssh-server']) => changed=false
cmd: apt-get install python-apt -y -q
item:
- sudo
- openssh-server
msg: |-
E: Failed to fetch http://mirror.glennmcgurrin.com/raspbian/pool/main/g/gnupg2/dirmngr_2.1.18-8~deb9u1_armhf.deb Something wicked happened resolving 'mirror.glennmcgurrin.com:http' (-5 - No address associated with hostname)
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
rc: 100
stderr: |-
E: Failed to fetch http://mirror.glennmcgurrin.com/raspbian/pool/main/g/gnupg2/dirmngr_2.1.18-8~deb9u1_armhf.deb Something wicked happened resolving 'mirror.glennmcgurrin.com:http' (-5 - No address associated with hostname)
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
stderr_lines:
- 'E: Failed to fetch http://mirror.glennmcgurrin.com/raspbian/pool/main/g/gnupg2/dirmngr_2.1.18-8~deb9u1_armhf.deb Something wicked happened resolving ''mirror.glennmcgurrin.com:http'' (-5 - No address associated with hostname)'
- 'E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?'
stdout: |-
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
dirmngr
Suggested packages:
dbus-user-session pinentry-gnome3 tor python-apt-dbg python-apt-doc
The following NEW packages will be installed:
dirmngr python-apt
0 upgraded, 2 newly installed, 0 to remove and 8 not upgraded.
Need to get 703 kB of archives.
After this operation, 1526 kB of additional disk space will be used.
Err:1 http://mirror.glennmcgurrin.com/raspbian stretch/main armhf dirmngr armhf 2.1.18-8~deb9u1
Something wicked happened resolving 'mirror.glennmcgurrin.com:http' (-5 - No address associated with hostname)
Get:2 http://mirror.glennmcgurrin.com/raspbian stretch/main armhf python-apt armhf 1.1.0~beta5 [157 kB]
Fetched 157 kB in 1s (91.5 kB/s)
stdout_lines: <omitted>
Had to sudo nano /etc/apt/sources.list
, paste in a different mirror from the Raspbian mirrors list—I chose deb http://reflection.oss.ou.edu/raspbian/raspbian/ stretch main contrib non-free rpi
—then sudo apt-get update
on the Pi to get the new mirror to be used.
Aha! To get Flannel working... I had to:
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | sed "s/amd64/arm/g" | kubectl create -f -
(Basically, download the flannel.yml file, and replace all occurrences of amd64
with arm
, then apply that.)
See: https://github.com/coreos/flannel/issues/663#issuecomment-299593569
Adding a task for now:
# TODO: See https://github.com/coreos/flannel/issues/663
- name: Apply Drupal 8 Kubernetes services to the cluster.
shell: >
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" |
sed "s/amd64/arm/g" |
kubectl apply -f -
register: flannel_result
changed_when: "'created' in flannel_result.stdout"
run_once: True
So, some things to continue working on:
kubectl exec
.Another project to refer people to, by the wonderful @chris-short: https://rak8s.io / https://github.com/rak8s/rak8s
Well that was easy enough...
root@kube1:/home/pi# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube1.pidramble.com Ready master 21h v1.10.2
kube2.pidramble.com Ready <none> 10m v1.10.2
And to test that it's working, I ran the perl pi job again:
root@kube1:/home/pi# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
drupal8-5cbd76cb5b-k8kn8 1/1 Running 1 21h 10.244.0.5 kube1.pidramble.com
drupal8-mysql-788d8dd84b-75kdt 1/1 Running 1 21h 10.244.0.6 kube1.pidramble.com
pi-lp26j 0/1 ContainerCreating 0 8s <none> kube2.pidramble.com
I also killed the kube2 node and restarted it... worked well:
root@kube1:/mnt/nfs# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube1.pidramble.com Ready master 22h v1.10.2
kube2.pidramble.com NotReady <none> 23m v1.10.2
root@kube1:/mnt/nfs# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube1.pidramble.com Ready master 22h v1.10.2
kube2.pidramble.com Ready <none> 25m v1.10.2
Note that the Pi kept re-enabling swap even after I killed the service and swap multiple times. It seems the Pi is determined to keep some swap space available (99m in this case) after a reboot no matter what!
So, first hurdle when pods get spread out...
root@kube1:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
drupal8-5cbd76cb5b-lz7dx 1/1 Running 0 18m 10.244.1.3 kube2.pidramble.com
drupal8-mysql-788d8dd84b-hrfln 1/1 Running 0 18m 10.244.0.8 kube1.pidramble.com
When I try installing Drupal, I get the error from the Drupal installer:
SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution.
I thought the hostname drupal8-mysql
would still work even if the pod is on a different node. Also it's slightly annoying that NodePorts are kind of dynamic—you have to use the node's external IP with the NodePort, there's no magical routing from any other node to the node where the service's NodePort resides... I'm going to have to deep dive into some Kubernetes/Flannel networking and DNS docs!
Testing on the full stack now... Just ran out to Micro Center to grab 5 Pi model 3 B+'s to replace the model 3's in my existing cluster. We'll see how it goes—the playbook's running now!
Woot, first try!
root@kube1:/home/pi# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube1.pidramble.com Ready master 28m v1.10.2
kube2.pidramble.com Ready <none> 28m v1.10.2
kube3.pidramble.com Ready <none> 28m v1.10.2
kube4.pidramble.com Ready <none> 28m v1.10.2
kube5.pidramble.com Ready <none> 28m v1.10.2
root@kube1:/home/pi# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
drupal8 NodePort 10.105.7.230 <none> 80:31746/TCP 27m
drupal8-mysql ClusterIP None <none> 3306/TCP 27m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29m
NAME READY STATUS RESTARTS AGE IP NODE
drupal8-5cbd76cb5b-g5bw9 1/1 Running 0 27m 10.244.3.2 kube5.pidramble.com
drupal8-mysql-788d8dd84b-xs975 0/1 Pending 0 27m <none> <none>
Looks like MySQL isn't too happy, unfortunately. But I've also switched the master to not run pods for now, so it preserves performance for Kubernetes/kubelet.
Some more on the MySQL issue:
root@kube1:/home/pi# kubectl describe pod drupal8-mysql-788d8dd84b-xs975
Name: drupal8-mysql-788d8dd84b-xs975
Namespace: default
Node: <none>
Labels: app=drupal8
pod-template-hash=3448488406
tier=mysql
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/drupal8-mysql-788d8dd84b
Containers:
mysql:
Image: hypriot/rpi-mysql:5.5
Port: 3306/TCP
Host Port: 0/TCP
Environment:
MYSQL_DATABASE: drupal
MYSQL_USER: drupal
MYSQL_PASSWORD: <set to the key 'password' in secret 'drupal8-mysql-pass'> Optional: false
MYSQL_ROOT_PASSWORD: <set to the key 'password' in secret 'drupal8-mysql-root-pass'> Optional: false
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s9mvh (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: drupal8-mysql
ReadOnly: false
default-token-s9mvh:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s9mvh
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28m default-scheduler pod has unbound PersistentVolumeClaims (repeated 3 times)
Warning FailedScheduling 28m (x3 over 28m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 1 node(s) were not ready, 3 node(s) had volume node affinity conflict.
Warning FailedScheduling 28m (x3 over 28m) default-scheduler 0/5 nodes are available: 5 node(s) were not ready.
Warning FailedScheduling 3m (x88 over 28m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 4 node(s) had volume node affinity conflict.
See:
But this one looks a little... interesting. Maybe I need to find a way to integrate the NFS storage with K8s instead of hacking around using local-storage
, because I'll definitely get burned in the latter case.
Or maybe add some sort of affinity towards kube5 or something for MySQL? Would that do the trick? Isn't Kubernetes just supposed to be magic? ;)
It was the node affinity, for sure; updated to point it at kube5.pidramble.com and that worked.
Also looking into NFS-based volumes, and it looks like if I:
Then I can use PVCs to mount things in containers for persistence (e.g. if I need multiple Drupal site pods hitting one shared files dir, or multiple Drupal sites (multisite or otherwise) hitting different shared files dirs). See more: Using Persistent Volumes on bare metal.
Also annoying... every reboot I have to disable swap again before kubelet
will start happily... and in desperation, I am now asking on the RPi stack exchange site, How to permanently disable swap on Raspbian Stretch Lite
Got the swap thing sorted; I needed to use shell
instead of command
because I had &&
s, forgot about that, oops!
I think I might bless the 'official' K8s branch, so I can file multiple issues instead of this one giant one. Closing this out as the PoC is working and I'm pretty pleased with how it turned out. Next step is adding a few issues to work on things like ingress controller and a proper Drupal installation.
I've been wanting to get some real-world experience with Kubernetes (not just minikube locally, or a hosted solution like GCE or EKS), and what better way than to move my Raspberry Pi Dramble over to using k8s? This could be a fool's errand... or it could work great. I'll have to see, only one way to find out though!