Project31 / ansible-kubernetes-openshift-pi3

Ansible playbooks for setting up a Kubernetes Raspberry Pi 3 cluster
190 stars 56 forks source link

Not sure what's going on... #7

Closed calphool closed 7 years ago

calphool commented 7 years ago

I followed the instructions, and I've definitely got docker up and running on the nodes. However, the Kubernetes install seems incomplete or something (no kubernetes master binaries?)

When I run kubectl cluster-info I get this:

The connection to the server master:8080 was refused - did you specify the right host or port?

However I can ping master just fine.

When I run:

netstat -a | grep 8080 on the master node, I get nothing back, so it seems that something was supposed to be installed, but it didn't get installed. When I look at the ansible scripts however I don't see where anything like the API listener was included.

What am I missing here?

rhuss commented 7 years ago

Thanks for trying out this setup. Things might go wrong since its still a complex scenario.

So what happens here is that the Kubernetes API server didn't start up. You should examine the following:

Does this help ?

BTW, I'm currently about to switch to hypriot OS, where the modification can be found in branch hypriot, which I will soon merge to master.

calphool commented 7 years ago

ssh to the master and check with ps whether kubelet is running.

pi@n0:~ $ ps -aux | grep kubelet
root       902  3.3  4.7 937408 45376 ?        Ssl  20:08   0:28 /usr/bin/kubelet --api-servers=http://master:8080 --allow-privileged=true --pod_infra_container_image=gcr.io/google_containers/pause-arm:3.0 --config=/etc/kubernetes/manifests --cluster-dns=10.200.100.10 --cluster-domain=cluster.local --v=2

Check that you can reach out from within your Pis by doing a 'sudo ping ...'. Its mandatory that you internet access from your cluster, so you might check the README how to do this.

pi@n0:~ $ ping cisco.com
PING cisco.com (72.163.4.161) 56(84) bytes of data.
64 bytes from www1.cisco.com (72.163.4.161): icmp_seq=1 ttl=245 time=51.2 ms

I think something is supposed to be running on the master listening on port 8080... but:

pi@n0:~ $ netstat -a | grep 8080

returns nothing...

I'm not sure which ansible script was supposed to install things like the kubernetes-api listener, but I don't think it's there.

calphool commented 7 years ago

Here's why I think there's something wrong with the startup. This is the /var/log/syslog on the master node (n0):

Log file ``` Oct 15 08:17:51 n0 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="491" x-info="http://www.rsyslog.com"] start Oct 15 08:17:51 n0 systemd-modules-load[116]: Inserted module 'overlay' Oct 15 08:17:51 n0 systemd[1]: Mounted Configuration File System. Oct 15 08:17:51 n0 systemd[1]: Started Apply Kernel Variables. Oct 15 08:17:51 n0 systemd[1]: Started Create Static Device Nodes in /dev. Oct 15 08:17:51 n0 systemd[1]: Starting udev Kernel Device Manager... Oct 15 08:17:51 n0 systemd-fsck[101]: e2fsck 1.42.12 (29-Aug-2014) Oct 15 08:17:51 n0 systemd-fsck[101]: /dev/mmcblk0p2: clean, 279776/1919232 files, 1729066/7798016 blocks Oct 15 08:17:51 n0 fake-hwclock[104]: Sat Oct 15 13:17:49 UTC 2016 Oct 15 08:17:51 n0 systemd[1]: Started udev Kernel Device Manager. Oct 15 08:17:51 n0 systemd[1]: Starting Copy rules generated while the root was ro... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Set preliminary keymap... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Tune IDE hard disks... Oct 15 08:17:51 n0 systemd[1]: Started Copy rules generated while the root was ro. Oct 15 08:17:51 n0 hdparm[138]: Setting parameters of disc: (none). Oct 15 08:17:51 n0 systemd[1]: Started LSB: Tune IDE hard disks. Oct 15 08:17:51 n0 systemd[1]: Starting Sound Card. Oct 15 08:17:51 n0 systemd[1]: Reached target Sound Card. Oct 15 08:17:51 n0 systemd[1]: Found device /dev/mmcblk0p1. Oct 15 08:17:51 n0 systemd[1]: Starting File System Check on /dev/mmcblk0p1... Oct 15 08:17:51 n0 keyboard-setup[137]: Setting preliminary keymap...done. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Set preliminary keymap. Oct 15 08:17:51 n0 systemd[1]: Starting Show Plymouth Boot Screen... Oct 15 08:17:51 n0 systemd[1]: Starting Remount Root and Kernel File Systems... Oct 15 08:17:51 n0 systemd-fsck[212]: fsck.fat 3.0.27 (2014-11-12) Oct 15 08:17:51 n0 systemd-fsck[212]: /dev/mmcblk0p1: 122 files, 2653/8057 clusters Oct 15 08:17:51 n0 systemd[1]: Started File System Check on /dev/mmcblk0p1. Oct 15 08:17:51 n0 systemd[1]: Started Remount Root and Kernel File Systems. Oct 15 08:17:51 n0 systemd[1]: Received SIGRTMIN+20 from PID 229 (plymouthd). Oct 15 08:17:51 n0 systemd[1]: Started Show Plymouth Boot Screen. Oct 15 08:17:51 n0 systemd[1]: Starting Forward Password Requests to Plymouth Directory Watch. Oct 15 08:17:51 n0 systemd[1]: Started Forward Password Requests to Plymouth Directory Watch. Oct 15 08:17:51 n0 systemd[1]: Started Dispatch Password Requests to Console Directory Watch. Oct 15 08:17:51 n0 systemd[1]: Starting Paths. Oct 15 08:17:51 n0 systemd[1]: Reached target Paths. Oct 15 08:17:51 n0 systemd[1]: Starting system-systemd\x2drfkill.slice. Oct 15 08:17:51 n0 systemd[1]: Created slice system-systemd\x2drfkill.slice. Oct 15 08:17:51 n0 systemd[1]: Starting Load/Save RF Kill Switch Status of rfkill0... Oct 15 08:17:51 n0 systemd[1]: Activating swap /swapfile... Oct 15 08:17:51 n0 systemd[1]: Started Various fixups to make systemd work better on Debian. Oct 15 08:17:51 n0 systemd[1]: Starting Load/Save Random Seed... Oct 15 08:17:51 n0 systemd[1]: Starting Local File Systems (Pre). Oct 15 08:17:51 n0 systemd[1]: Reached target Local File Systems (Pre). Oct 15 08:17:51 n0 systemd[1]: Mounting /boot... Oct 15 08:17:51 n0 systemd[1]: Started Load/Save RF Kill Switch Status of rfkill0. Oct 15 08:17:51 n0 systemd[1]: Started Load/Save Random Seed. Oct 15 08:17:51 n0 systemd[1]: Mounted /boot. Oct 15 08:17:51 n0 systemd[1]: Starting system-ifup.slice. Oct 15 08:17:51 n0 systemd[1]: Created slice system-ifup.slice. Oct 15 08:17:51 n0 systemd[1]: Starting Local File Systems. Oct 15 08:17:51 n0 systemd[1]: Reached target Local File Systems. Oct 15 08:17:51 n0 systemd[1]: Starting Tell Plymouth To Write Out Runtime Data... Oct 15 08:17:51 n0 systemd[1]: Starting Create Volatile Files and Directories... Oct 15 08:17:51 n0 systemd[1]: Starting Remote File Systems. Oct 15 08:17:51 n0 systemd[1]: Reached target Remote File Systems. Oct 15 08:17:51 n0 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Prepare console... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Switch to ondemand cpu governor (unless shift key is pressed)... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Raise network interfaces.... Oct 15 08:17:51 n0 systemd[1]: Started Tell Plymouth To Write Out Runtime Data. Oct 15 08:17:51 n0 systemd[1]: Started Create Volatile Files and Directories. Oct 15 08:17:51 n0 raspi-config[254]: Checking if shift key is held down:Error opening '/dev/input/event*': No such file or directory Oct 15 08:17:51 n0 raspi-config[254]: No. Switching to ondemand scaling governor. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Switch to ondemand cpu governor (unless shift key is pressed). Oct 15 08:17:51 n0 kbd[253]: Setting console screen modes. Oct 15 08:17:51 n0 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage. Oct 15 08:17:51 n0 systemd[1]: Activated swap /swapfile. Oct 15 08:17:51 n0 systemd[1]: Starting Swap. Oct 15 08:17:51 n0 systemd[1]: Reached target Swap. Oct 15 08:17:51 n0 systemd[1]: Starting Update UTMP about System Boot/Shutdown... Oct 15 08:17:51 n0 systemd[1]: Started Update UTMP about System Boot/Shutdown. Oct 15 08:17:51 n0 kbd[253]: setterm: $TERM is not defined. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Prepare console. Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Set console font and keymap... Oct 15 08:17:51 n0 console-setup[306]: Setting up console font and keymap...done. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Set console font and keymap. Oct 15 08:17:51 n0 networking[255]: Configuring network interfaces...done. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Raise network interfaces.. Oct 15 08:17:51 n0 systemd[1]: Starting ifup for wlan0... Oct 15 08:17:51 n0 systemd[1]: Started ifup for wlan0. Oct 15 08:17:51 n0 systemd[1]: Starting System Initialization. Oct 15 08:17:51 n0 systemd[1]: Reached target System Initialization. Oct 15 08:17:51 n0 systemd[1]: Starting Docker Socket for the API. Oct 15 08:17:51 n0 systemd[1]: Starting Avahi mDNS/DNS-SD Stack Activation Socket. Oct 15 08:17:51 n0 systemd[1]: Listening on Avahi mDNS/DNS-SD Stack Activation Socket. Oct 15 08:17:51 n0 systemd[1]: Starting D-Bus System Message Bus Socket. Oct 15 08:17:51 n0 systemd[1]: Listening on D-Bus System Message Bus Socket. Oct 15 08:17:51 n0 systemd[1]: Starting Daily Cleanup of Temporary Directories. Oct 15 08:17:51 n0 systemd[1]: Started Daily Cleanup of Temporary Directories. Oct 15 08:17:51 n0 systemd[1]: Starting Timers. Oct 15 08:17:51 n0 systemd[1]: Reached target Timers. Oct 15 08:17:51 n0 systemd[1]: Started Manage Sound Card State (restore and store). Oct 15 08:17:51 n0 systemd[1]: Starting Restore Sound Card State... Oct 15 08:17:51 n0 systemd[1]: Listening on Docker Socket for the API. Oct 15 08:17:51 n0 systemd[1]: Starting Sockets. Oct 15 08:17:51 n0 systemd[1]: Reached target Sockets. Oct 15 08:17:51 n0 systemd[1]: Starting Basic System. Oct 15 08:17:51 n0 systemd[1]: Reached target Basic System. Oct 15 08:17:51 n0 systemd[1]: Starting dhcpcd on all interfaces... Oct 15 08:17:51 n0 systemd[1]: Starting Regular background program processing daemon... Oct 15 08:17:51 n0 systemd[1]: Started Regular background program processing daemon. Oct 15 08:17:51 n0 systemd[1]: Starting Configure Bluetooth Modems connected by UART... Oct 15 08:17:51 n0 systemd[1]: Starting Login Service... Oct 15 08:17:51 n0 systemd[1]: Started getty on tty2-tty6 if dbus and logind are not available. Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Set up cgroupfs mounts.... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: Autogenerate and use a swap file... Oct 15 08:17:51 n0 systemd[1]: Starting LSB: triggerhappy hotkey daemon... Oct 15 08:17:51 n0 cron[414]: (CRON) INFO (pidfile fd = 3) Oct 15 08:17:51 n0 systemd[1]: Starting Avahi mDNS/DNS-SD Stack... Oct 15 08:17:51 n0 systemd[1]: Starting D-Bus System Message Bus... Oct 15 08:17:51 n0 dhcpcd[412]: version 6.7.1 starting Oct 15 08:17:51 n0 systemd[1]: Started D-Bus System Message Bus. Oct 15 08:17:51 n0 dhcpcd[412]: dev: loaded udev Oct 15 08:17:51 n0 dhcpcd[412]: eth0: adding address fe80::bf79:a9e3:858b:a28e Oct 15 08:17:51 n0 dhcpcd[412]: wlan0: adding address fe80::7a24:4194:f582:9f08 Oct 15 08:17:51 n0 dphys-swapfile[419]: Starting dphys-swapfile swapfile setup ... Oct 15 08:17:51 n0 cron[414]: (CRON) INFO (Running @reboot jobs) Oct 15 08:17:51 n0 triggerhappy[422]: Error opening '/dev/input/event*': No such file or directory Oct 15 08:17:51 n0 dphys-swapfile[419]: want /var/swap=100MByte, checking existing: keeping it Oct 15 08:17:51 n0 avahi-daemon[424]: Found user 'avahi' (UID 105) and group 'avahi' (GID 110). Oct 15 08:17:51 n0 avahi-daemon[424]: Successfully dropped root privileges. Oct 15 08:17:51 n0 avahi-daemon[424]: avahi-daemon 0.6.31 starting up. Oct 15 08:17:51 n0 wpa_supplicant[423]: Successfully initialized wpa_supplicant Oct 15 08:17:51 n0 dphys-swapfile[419]: done. Oct 15 08:17:51 n0 avahi-daemon[424]: Successfully called chroot(). Oct 15 08:17:51 n0 avahi-daemon[424]: Successfully dropped remaining capabilities. Oct 15 08:17:51 n0 avahi-daemon[424]: Loading service file /services/master.service. Oct 15 08:17:51 n0 avahi-daemon[424]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::7a24:4194:f582:9f08. Oct 15 08:17:51 n0 avahi-daemon[424]: New relevant interface wlan0.IPv6 for mDNS. Oct 15 08:17:51 n0 avahi-daemon[424]: Network interface enumeration completed. Oct 15 08:17:51 n0 avahi-daemon[424]: Registering new address record for fe80::7a24:4194:f582:9f08 on wlan0.*. Oct 15 08:17:51 n0 avahi-daemon[424]: Registering HINFO record with values 'ARMV7L'/'LINUX'. Oct 15 08:17:51 n0 dbus[426]: [system] Successfully activated service 'org.freedesktop.systemd1' Oct 15 08:17:51 n0 systemd[1]: Started Avahi mDNS/DNS-SD Stack. Oct 15 08:17:51 n0 systemd[1]: Starting System Logging Service... Oct 15 08:17:51 n0 systemd[1]: Started Restore Sound Card State. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Set up cgroupfs mounts.. Oct 15 08:17:51 n0 systemd[1]: Started LSB: Autogenerate and use a swap file. Oct 15 08:17:51 n0 systemd[1]: Started LSB: triggerhappy hotkey daemon. Oct 15 08:17:51 n0 systemd[1]: Started Login Service. Oct 15 08:17:51 n0 systemd[1]: Started System Logging Service. Oct 15 08:17:51 n0 wpa_supplicant[517]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=USER type=COUNTRY alpha2=GB Oct 15 08:17:52 n0 avahi-daemon[424]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::bf79:a9e3:858b:a28e. Oct 15 08:17:52 n0 avahi-daemon[424]: New relevant interface eth0.IPv6 for mDNS. Oct 15 08:17:52 n0 avahi-daemon[424]: Registering new address record for fe80::bf79:a9e3:858b:a28e on eth0.*. Oct 15 08:17:52 n0 dhcpcd[412]: eth0: waiting for carrier Oct 15 08:17:52 n0 dhcpcd[412]: wlan0: waiting for carrier Oct 15 08:17:52 n0 wpa_supplicant[517]: wlan0: Trying to associate with 40:8b:07:8a:36:35 (SSID='WirelessC' freq=2422 MHz) Oct 15 08:17:52 n0 wpa_supplicant[517]: wlan0: Associated with 40:8b:07:8a:36:35 Oct 15 08:17:52 n0 wpa_supplicant[517]: wlan0: WPA: Key negotiation completed with 40:8b:07:8a:36:35 [PTK=CCMP GTK=TKIP] Oct 15 08:17:52 n0 wpa_supplicant[517]: wlan0: CTRL-EVENT-CONNECTED - Connection to 40:8b:07:8a:36:35 completed [id=0 id_str=] Oct 15 08:17:52 n0 dhcpcd[412]: wlan0: carrier acquired Oct 15 08:17:52 n0 dhcpcd[412]: DUID 00:01:00:01:1f:77:64:36:b8:27:eb:06:42:57 Oct 15 08:17:52 n0 dhcpcd[412]: wlan0: IAID eb:06:42:57 Oct 15 08:17:52 n0 avahi-daemon[424]: Server startup complete. Host name is n0.local. Local service cookie is 3552771086. Oct 15 08:17:52 n0 dhcpcd[412]: wlan0: soliciting an IPv6 router Oct 15 08:17:53 n0 dhcpcd[412]: wlan0: rebinding lease of 192.168.1.134 Oct 15 08:17:53 n0 avahi-daemon[424]: Service "master [b8:27:eb:06:42:57]" (/services/master.service) successfully established. Oct 15 08:17:55 n0 hciattach[416]: bcm43xx_init Oct 15 08:17:55 n0 hciattach[416]: Flash firmware /lib/firmware/BCM43430A1.hcd Oct 15 08:17:55 n0 hciattach[416]: Set Controller UART speed to 921600 bit/s Oct 15 08:17:55 n0 hciattach[416]: Device setup complete Oct 15 08:17:55 n0 systemd[1]: Started Configure Bluetooth Modems connected by UART. Oct 15 08:17:55 n0 systemd[1]: Starting Bluetooth service... Oct 15 08:17:55 n0 systemd[1]: Starting Load/Save RF Kill Switch Status of rfkill1... Oct 15 08:17:55 n0 systemd[1]: Started Load/Save RF Kill Switch Status of rfkill1. Oct 15 08:17:56 n0 bluetoothd[604]: Bluetooth daemon 5.23 Oct 15 08:17:56 n0 systemd[1]: Started Bluetooth service. Oct 15 08:17:56 n0 systemd[1]: Starting Bluetooth. Oct 15 08:17:56 n0 systemd[1]: Reached target Bluetooth. Oct 15 08:17:56 n0 bluetoothd[604]: Starting SDP server Oct 15 08:17:56 n0 dbus[426]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' Oct 15 08:17:56 n0 bluetoothd[604]: Bluetooth management interface 1.10 initialized Oct 15 08:17:56 n0 bluetoothd[604]: Sap driver initialization failed. Oct 15 08:17:56 n0 bluetoothd[604]: sap-server: Operation not permitted (1) Oct 15 08:17:56 n0 systemd[1]: Starting Hostname Service... Oct 15 08:17:56 n0 systemd-hostnamed[608]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname! Oct 15 08:17:56 n0 dbus[426]: [system] Successfully activated service 'org.freedesktop.hostname1' Oct 15 08:17:56 n0 systemd[1]: Started Hostname Service. Oct 15 08:17:58 n0 dhcpcd[412]: wlan0: leased 192.168.1.134 for 432000 seconds Oct 15 08:17:58 n0 dhcpcd[412]: wlan0: adding route to 192.168.1.0/24 Oct 15 08:17:58 n0 avahi-daemon[424]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.134. Oct 15 08:17:58 n0 dhcpcd[412]: wlan0: adding default route via 192.168.1.1 Oct 15 08:17:58 n0 avahi-daemon[424]: New relevant interface wlan0.IPv4 for mDNS. Oct 15 08:17:58 n0 avahi-daemon[424]: Registering new address record for 192.168.1.134 on wlan0.IPv4. Oct 15 08:17:59 n0 dhcpcd[412]: forked to background, child pid 703 Oct 15 08:17:59 n0 systemd[1]: Started dhcpcd on all interfaces. Oct 15 08:17:59 n0 systemd[1]: Starting Network. Oct 15 08:17:59 n0 systemd[1]: Reached target Network. Oct 15 08:17:59 n0 systemd[1]: Starting OpenBSD Secure Shell server... Oct 15 08:17:59 n0 systemd[1]: Started OpenBSD Secure Shell server. Oct 15 08:17:59 n0 systemd[1]: Starting /etc/rc.local Compatibility... Oct 15 08:17:59 n0 systemd[1]: Starting Network is Online. Oct 15 08:17:59 n0 systemd[1]: Reached target Network is Online. Oct 15 08:17:59 n0 systemd[1]: Starting LSB: Start NTP daemon... Oct 15 08:17:59 n0 systemd[1]: Starting etcd Config Store... Oct 15 08:17:59 n0 systemd[1]: Started etcd Config Store. Oct 15 08:17:59 n0 systemd[1]: Starting Flannel Overlay Network for Kubernetes... Oct 15 08:17:59 n0 systemd[1]: Starting Permit User Sessions... Oct 15 08:17:59 n0 systemd[1]: Started /etc/rc.local Compatibility. Oct 15 08:17:59 n0 flannel_init.sh[716]: /etc/kubernetes/flannel_init.sh: 2: /etc/kubernetes/flannel_init.sh: [[: not found Oct 15 08:17:59 n0 systemd[1]: Started Permit User Sessions. Oct 15 08:17:59 n0 systemd[1]: Starting Hold until boot process finishes up... Oct 15 08:17:59 n0 systemd[1]: Starting Terminate Plymouth Boot Screen... Oct 15 08:17:59 n0 systemd[1]: Received SIGRTMIN+21 from PID 229 (plymouthd). Oct 15 08:17:59 n0 ntpd[728]: ntpd 4.2.6p5@1.2349-o Mon Jul 25 22:35:28 UTC 2016 (1) Oct 15 08:17:59 n0 ntpd[734]: proto: precision = 0.521 usec Oct 15 08:17:59 n0 ntpd[734]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 Oct 15 08:17:59 n0 ntp[708]: Starting NTP server: ntpd. Oct 15 08:17:59 n0 systemd[1]: Started LSB: Start NTP daemon. Oct 15 08:17:59 n0 ntpd[734]: Listen and drop on 1 v6wildcard :: UDP 123 Oct 15 08:17:59 n0 ntpd[734]: Listen normally on 2 lo 127.0.0.1 UDP 123 Oct 15 08:17:59 n0 ntpd[734]: Listen normally on 3 wlan0 192.168.1.134 UDP 123 Oct 15 08:17:59 n0 ntpd[734]: Listen normally on 4 lo ::1 UDP 123 Oct 15 08:17:59 n0 ntpd[734]: Listen normally on 5 wlan0 fe80::7a24:4194:f582:9f08 UDP 123 Oct 15 08:17:59 n0 ntpd[734]: peers refreshed Oct 15 08:17:59 n0 ntpd[734]: Listening on routing socket on fd #22 for interface updates Oct 15 08:17:59 n0 systemd[1]: Started Hold until boot process finishes up. Oct 15 08:17:59 n0 systemd[1]: Started Terminate Plymouth Boot Screen. Oct 15 08:17:59 n0 systemd[1]: Starting Getty on tty1... Oct 15 08:17:59 n0 systemd[1]: Started Getty on tty1. Oct 15 08:17:59 n0 systemd[1]: Starting Login Prompts. Oct 15 08:17:59 n0 systemd[1]: Reached target Login Prompts. Oct 15 08:18:00 n0 etcd[713]: running etcd on unsupported architecture "arm" since ETCD_UNSUPPORTED_ARCH is set Oct 15 08:18:00 n0 etcd[713]: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=arm Oct 15 08:18:00 n0 etcd[713]: setting maximum number of CPUs to 4, total number of available CPUs is 4 Oct 15 08:18:00 n0 etcd[713]: the server is already initialized as member before, starting as etcd member... Oct 15 08:18:00 n0 etcd[713]: listening for peers on http://master:2380 Oct 15 08:18:00 n0 etcd[713]: listening for client requests on http://master:2379 Oct 15 08:18:00 n0 etcd[713]: recovered store from snapshot at index 170017 Oct 15 08:18:00 n0 etcd[713]: name = etcd Oct 15 08:18:00 n0 etcd[713]: data dir = /var/lib/etcd Oct 15 08:18:00 n0 etcd[713]: member dir = /var/lib/etcd/member Oct 15 08:18:00 n0 etcd[713]: heartbeat = 100ms Oct 15 08:18:00 n0 etcd[713]: election = 1000ms Oct 15 08:18:00 n0 etcd[713]: snapshot count = 10000 Oct 15 08:18:00 n0 etcd[713]: advertise client URLs = http://master:2379 Oct 15 08:18:02 n0 etcd[713]: restarting member ce2a822cea30bfca in cluster 7e27652122e8b2ae at commit index 171874 Oct 15 08:18:02 n0 etcd[713]: ce2a822cea30bfca became follower at term 25 Oct 15 08:18:02 n0 etcd[713]: newRaft ce2a822cea30bfca [peers: [ce2a822cea30bfca], term: 25, commit: 171874, applied: 170017, lastindex: 171874, lastterm: 25] Oct 15 08:18:02 n0 etcd[713]: added member ce2a822cea30bfca [http://localhost:2380 http://localhost:7001] to cluster 7e27652122e8b2ae from store Oct 15 08:18:02 n0 etcd[713]: set the cluster version to 2.3 from store Oct 15 08:18:02 n0 etcd[713]: starting server... [version: 2.3.7, cluster version: 2.3] Oct 15 08:18:02 n0 etcd[713]: failed to notify systemd for readiness: No socket Oct 15 08:18:02 n0 etcd[713]: forgot to set Type=notify in systemd service file? Oct 15 08:18:02 n0 flannel_init.sh[716]: Setting network config for flannel: Oct 15 08:18:03 n0 etcd[713]: ce2a822cea30bfca is starting a new election at term 25 Oct 15 08:18:03 n0 etcd[713]: ce2a822cea30bfca became candidate at term 26 Oct 15 08:18:03 n0 etcd[713]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 26 Oct 15 08:18:03 n0 etcd[713]: ce2a822cea30bfca became leader at term 26 Oct 15 08:18:03 n0 etcd[713]: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 26 Oct 15 08:18:03 n0 etcd[713]: published {Name:etcd ClientURLs:[http://master:2379]} to cluster 7e27652122e8b2ae Oct 15 08:18:03 n0 flannel_init.sh[716]: { Oct 15 08:18:03 n0 flannel_init.sh[716]: "Network": "10.1.0.0/16", Oct 15 08:18:03 n0 flannel_init.sh[716]: "Backend": { Oct 15 08:18:03 n0 flannel_init.sh[716]: "Type": "host-gw" Oct 15 08:18:03 n0 flannel_init.sh[716]: } Oct 15 08:18:03 n0 flannel_init.sh[716]: } Oct 15 08:18:03 n0 systemd[1]: Started Flannel Overlay Network for Kubernetes. Oct 15 08:18:03 n0 systemd[1]: Starting Docker Application Container Engine... Oct 15 08:18:03 n0 ifconfig[761]: docker0: ERROR while getting interface flags: No such device Oct 15 08:18:03 n0 brctl[767]: bridge docker0 doesn't exist; can't delete it Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.820930 00757 main.go:275] Installing signal handlers Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.821372 00757 main.go:130] Determining IP address of default interface Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.822088 00757 main.go:188] Using 192.168.1.134 as external interface Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.822183 00757 main.go:189] Using 192.168.1.134 as external endpoint Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.832601 00757 etcd.go:129] Found lease (10.1.48.0/24) for current IP (192.168.1.134), reusing Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.835158 00757 etcd.go:84] Subnet lease acquired: 10.1.48.0/24 Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.840099 00757 hostgw.go:100] Watching for new subnet leases Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.843034 00757 hostgw.go:140] Subnet added: 10.1.100.0/24 via 192.168.1.136 Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.843596 00757 hostgw.go:140] Subnet added: 10.1.36.0/24 via 192.168.1.135 Oct 15 08:18:03 n0 flanneld[757]: I1015 08:18:03.843945 00757 hostgw.go:140] Subnet added: 10.1.98.0/24 via 192.168.1.133 Oct 15 08:18:05 n0 dhcpcd[703]: wlan0: no IPv6 Routers available Oct 15 08:18:13 n0 systemd[1]: Time has been changed Oct 15 08:18:13 n0 docker[782]: time="2016-10-15T08:18:13.735629301-05:00" level=warning msg="/!\\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\\" Oct 15 08:18:13 n0 docker[782]: time="2016-10-15T08:18:13.748621195-05:00" level=info msg="New containerd process, pid: 788\n" Oct 15 08:18:15 n0 docker[782]: time="2016-10-15T08:18:15.366869110-05:00" level=info msg="Graph migration to content-addressability took 0.00 seconds" Oct 15 08:18:15 n0 docker[782]: time="2016-10-15T08:18:15.440492866-05:00" level=info msg="Firewalld running: false" Oct 15 08:18:15 n0 dhcpcd[703]: docker0: adding address fe80::a33e:619b:b73a:be35 Oct 15 08:18:16 n0 avahi-daemon[424]: Joining mDNS multicast group on interface docker0.IPv4 with address 10.1.48.1. Oct 15 08:18:16 n0 avahi-daemon[424]: New relevant interface docker0.IPv4 for mDNS. Oct 15 08:18:16 n0 dhcpcd[703]: docker0: waiting for carrier Oct 15 08:18:16 n0 avahi-daemon[424]: Registering new address record for 10.1.48.1 on docker0.IPv4. Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.057723609-05:00" level=warning msg="Your kernel does not support swap memory limit." Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.057956732-05:00" level=warning msg="Your kernel does not support kernel memory limit." Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.058069856-05:00" level=warning msg="Your kernel does not support cgroup cfs period" Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.058153084-05:00" level=warning msg="Your kernel does not support cgroup cfs quotas" Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.058352092-05:00" level=warning msg="Unable to find cpuset cgroup in mounts" Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.058840473-05:00" level=warning msg="mountpoint for pids not found" Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.059805983-05:00" level=info msg="Loading containers: start." Oct 15 08:18:16 n0 docker[782]: ........ Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.096222944-05:00" level=info msg="Loading containers: done." Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.096346953-05:00" level=info msg="Daemon has completed initialization" Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.096433775-05:00" level=info msg="Docker daemon" commit=5604cbe graphdriver=overlay version=1.11.1 Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.139907953-05:00" level=info msg="API listen on 192.168.1.134:2375" Oct 15 08:18:16 n0 systemd[1]: Started Docker Application Container Engine. Oct 15 08:18:16 n0 systemd[1]: Starting Kubernetes Kubelet... Oct 15 08:18:16 n0 docker[782]: time="2016-10-15T08:18:16.140098263-05:00" level=info msg="API listen on /var/run/docker.sock" Oct 15 08:18:16 n0 systemd[1]: Started Kubernetes Kubelet. Oct 15 08:18:16 n0 systemd[1]: Starting Multi-User System. Oct 15 08:18:16 n0 systemd[1]: Reached target Multi-User System. Oct 15 08:18:16 n0 systemd[1]: Starting Graphical Interface. Oct 15 08:18:16 n0 systemd[1]: Reached target Graphical Interface. Oct 15 08:18:16 n0 systemd[1]: Starting Update UTMP about System Runlevel Changes... Oct 15 08:18:16 n0 systemd[1]: Started Update UTMP about System Runlevel Changes. Oct 15 08:18:16 n0 systemd[1]: Startup finished in 2.122s (kernel) + 21.192s (userspace) = 23.314s. Oct 15 08:18:17 n0 ntpd[734]: Listen normally on 6 docker0 10.1.48.1 UDP 123 Oct 15 08:18:17 n0 ntpd[734]: peers refreshed Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.108649 897 docker.go:327] Start docker client with request timeout=2m0s Oct 15 08:18:25 n0 kubelet[897]: W1015 08:18:25.141449 897 server.go:487] Could not load kubeconfig file /var/lib/kubelet/kubeconfig: stat /var/lib/kubelet/kubeconfig: no such file or directory. Trying auth path instead. Oct 15 08:18:25 n0 kubelet[897]: W1015 08:18:25.142582 897 server.go:448] Could not load kubernetes auth path /var/lib/kubelet/kubernetes_auth: stat /var/lib/kubelet/kubernetes_auth: no such file or directory. Continuing with defaults. Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.148494 897 manager.go:138] cAdvisor running in container: "/" Oct 15 08:18:25 n0 kubelet[897]: W1015 08:18:25.185589 897 manager.go:146] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.210702 897 fs.go:139] Filesystem partitions: map[/dev/root:{mountpoint:/ major:179 minor:2 fsType:ext4 blockSize:0}] Oct 15 08:18:25 n0 kubelet[897]: E1015 08:18:25.214824 897 machine.go:193] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.270781 897 manager.go:192] Machine: {NumCores:4 CpuFrequency:0 MemoryCapacity:970485760 MachineID:fe1d4a4d0ac7441e817d011e70da2471 SystemUUID:fe1d4a4d0ac7441e817d011e70da2471 BootID:4e8d347f-1054-4f62-9b71-ef5bc4fc183d Filesystems:[{Device:/dev/root Capacity:31413710848 Type:vfs Inodes:1919232}] DiskMap:map[179:0:{Name:mmcblk0 Major:179 Minor:0 Size:32010928128 Scheduler:deadline}] NetworkDevices:[{Name:eth0 MacAddress:b8:27:eb:53:17:02 Speed:10 Mtu:1500} {Name:wlan0 MacAddress:b8:27:eb:06:42:57 Speed:0 Mtu:1500}] Topology:[{Id:0 Memory:0 Cores:[{Id:0 Threads:[0] Caches:[]} {Id:1 Threads:[1] Caches:[]} {Id:2 Threads:[2] Caches:[]} {Id:3 Threads:[3] Caches:[]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None} Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.274024 897 manager.go:198] Version: {KernelVersion:4.4.21-v7+ ContainerOsVersion:Raspbian GNU/Linux 8 (jessie) DockerVersion:1.11.1 CadvisorVersion: CadvisorRevision:} Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.277915 897 server.go:382] Using root directory: /var/lib/kubelet Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.282786 897 server.go:758] Adding manifest file: /etc/kubernetes/manifests Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.284028 897 file.go:47] Watching path "/etc/kubernetes/manifests" Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.284118 897 server.go:768] Watching apiserver Oct 15 08:18:25 n0 kubelet[897]: W1015 08:18:25.288442 897 kubelet.go:561] Hairpin mode set to "promiscuous-bridge" but configureCBR0 is false, falling back to "hairpin-veth" Oct 15 08:18:25 n0 kubelet[897]: I1015 08:18:25.288558 897 kubelet.go:384] Hairpin mode set to "hairpin-veth" Oct 15 08:18:25 n0 kubelet[897]: E1015 08:18:25.301360 897 reflector.go:205] pkg/kubelet/kubelet.go:281: Failed to list *api.Node: Get http://master:8080/api/v1/nodes?fieldSelector=metadata.name%3Dn0&resourceVersion=0: dial tcp 192.168.1.134:8080: getsockopt: connection refused Oct 15 08:18:25 n0 kubelet[897]: E1015 08:18:25.313455 897 reflector.go:205] pkg/kubelet/kubelet.go:262: Failed to list *api.Service: Get http://master:8080/api/v1/services?resourceVersion=0: dial tcp 192.168.1.134:8080: getsockopt: connection refused Oct 15 08:18:25 n0 rsyslogd-2007: action 'action 17' suspended, next retry is Sat Oct 15 08:18:55 2016 [try http://www.rsyslog.com/e/2007 ] Oct 15 08:18:25 n0 kubelet[897]: E1015 08:18:25.313857 897 reflector.go:205] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: Get http://master:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dn0&resourceVersion=0: dial tcp 192.168.1.134:8080: getsockopt: connection refused ...hundreds of additional "connection refused" messages.... Oct 15 08:22:01 n0 kubelet[897]: E1015 08:22:01.867519 897 event.go:142] Unable to write event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"n0.147db709c7639078", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"n0", UID:"n0", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"KubeletSetupFailed", Message:"Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [cpuset]", Source:api.EventSource{Component:"kubelet", Host:"n0"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63612134305, nsec:475170424, loc:(*time.Location)(0x3126cb0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63612134305, nsec:475170424, loc:(*time.Location)(0x3126cb0)}}, Count:1, Type:"Warning"}' (retry limit exceeded!) ```
rhuss commented 7 years ago

Yes, the connection refused messages are annoying (so I typically check the logs with | grep -v refused)

Could you please also do a sudo journalctl -au kubelet | grep -v refused on the master (n0) ?

Also interesting: "docker ps". You should see some containers from the hyperkube image. Could it be downloaded ? --> "docker images | grep hyperkube". (All these commands performed on the pis).

Some background info:

The kubelet is a process which works a little bit like small systemd. It start and manages certain containers which are defined in /etc/kubernetes/manifests. If you look into this, you see that several containers are started, i.e. the apiserver (this is the one listening on 8080).

kubernetes.yaml ``` yaml apiVersion: v1 kind: Pod metadata: name: kube-master namespace: kube-system spec: hostNetwork: true volumes: - name: "certs-volume" hostPath: path: "/etc/kubernetes/certs" containers: - name: "kube-apiserver" image: "gcr.io/google_containers/hyperkube-arm:v1.3.2" args: - "/hyperkube" - "apiserver" - "--allow-privileged=true" - "--etcd-servers=http://master:2379" - "--insecure-bind-address=0.0.0.0" - "--service-cluster-ip-range=10.200.100.0/24" - "--service-node-port-range=30000-37000" - "--v=2" - name: "kube-controller-manager" image: "gcr.io/google_containers/hyperkube-arm:v1.3.2" volumeMounts: - name: "certs-volume" mountPath: "/etc/kubernetes/certs" args: - "/hyperkube" - "controller-manager" - "--master=http://127.0.0.1:8080" - "--root-ca-file=/etc/kubernetes/certs/ca.crt" - "--service-account-private-key-file=/etc/kubernetes/certs/server.key" - "--pod-eviction-timeout=5s" - "--node-monitor-grace-period=10s" - "--v=2" - name: "kube-scheduler" image: "gcr.io/google_containers/hyperkube-arm:v1.3.2" args: - "/hyperkube" - "scheduler" - "--master=http://127.0.0.1:8080" - "--v=2" - name: "kube-proxy" image: "gcr.io/google_containers/hyperkube-arm:v1.3.2" args: - "/hyperkube" - "proxy" - "--master=http://127.0.0.1:8080" - "--v=2" securityContext: privileged: true ```

So we must ensure that the hyperkube containers are running on the master.

rhuss commented 7 years ago

Another thing: Since I'm currently actively working on the hypriot branch, will be the default quite soon, would it be possible to switch to Hypriot as a base image ?

For this checkout the "hypriot" branch, check the README (the initial setup is much easier) and possibly reflash your SD cards.

rhuss commented 7 years ago

BTW, I'm on the road next week so it might take a bit to answer.

calphool commented 7 years ago

It looks like it might have been caused because I didn't have this:

cgroup_enable=cpuset

in boot/cmdline.txt

The sudo journalctl -au kubelet | grep -v refused showed that it was repeatedly failing because the cgroup "cpuset" didn't exist. Now I see it downloading images, and docker info shows stuff appearing... let's see what happens here in a few minutes... ;-)

rhuss commented 7 years ago

Yes, that's mandatory. However the setup playbook should take care about. Of course, you need a restart after this (as said in the README).

rhuss commented 7 years ago

Good luck ;-) !

calphool commented 7 years ago

Hmmm... maybe something wrong with:

- name: Add cgroup for Memory limits to bootparams lineinfile: dest: /boot/cmdline.txt regexp: '^(.*?)(\s*cgroup_enable.*swapaccount=\d+)?$' line: '\1 cgroup_enable=memory swapaccount=1' backrefs: true state: present

this is in /roles/base/tasks/system.yml

Is this where you thought it should be setting that cgroup? I don't see "cpuset" in there.

cgroup_enable=cpuset

calphool commented 7 years ago

I could probably fix that and send a pull request if you like.

rhuss commented 7 years ago

Sorry, as said, I'm currently on the hypriot branch, and it might be that some fixed didn't made it into master.

Here's the relevant code on hypriot --> https://github.com/Project31/ansible-kubernetes-openshift-pi3/blob/hypriot/roles/base/tasks/system.yml#L1-L7

rhuss commented 7 years ago

I always love PRs ;-). The plan is to move the current master later on to a raspbian branch and merge the hypriot branch to master. The main reason for choosing hypriot is, that it allows a headless setup.

calphool commented 7 years ago

Okay... so now that I'm running... what do I do with this crazy contraption?

:stuck_out_tongue_winking_eye::stuck_out_tongue_winking_eye::stuck_out_tongue_winking_eye::stuck_out_tongue_winking_eye:

(Oh, FYI, if anybody else runs into this little gotcha, you do need to update /boot/cmdline.txt on ALL the nodes, not just the master, and then reboot the cluster).

rhuss commented 7 years ago

Awesome !!! Have fun ...