Open cmonty14 opened 2 years ago
Hi, originally I was trying to repeat GKE on-prem architecture:
First you need to bring your Master Kubernetes cluster or Admin cluster which consists of three control-plane nodes. They are control-plane nodes for this Admin cluster as they running control-plane services like etcd, kube-apiserver, scheduler and controller-manager.
And they are also running containerized control-plane for the User defined clusters (child clusters). I'm not sure about the terminology, but if you feel that it is confusing please send me a pull request to fix the documentation.
For a PXE bootable server I would need
- PXE server
- DHCP service
- NFS service Are these services deployed on the HA control-plane nodes?
Yes they are
And what storage type is used? Is it local storage, means any control-plane node offers a NFS service?
Etcd the only storage consumer. Etcd enables HA on application layer, so you don't need any high-available storage for that. I suggest using local-path-provisioner as the most simplest solution
Hi, many thanks for your reply.
Things are getting clearer now.
There's just this follow-up question regarding NFS / Netboot Servers. My understanding is there must be a Netboot server that provides a shared storage with NFS. If this is located on a single node, it would be a SPOF and break HA. Therefore I concluded that this is provided by the Master kubernetes cluster (= Admin cluster).
In my homelab this Admin cluster is built on three Raspi4 nodes, each with 4GB RAM and a SSD connected. How would this work with local-path-provisioner? Would the shared storage be provided by every single SSD, and that means 2 SSDs have replicated data?
NFS is not used as whole rootfs-image is loaded directly into RAM. This image is given by LTSP-server which is separate for each user cluster
In my homelab this Admin cluster is built on three Raspi4 nodes, each with 4GB RAM and a SSD connected. How would this work with local-path-provisioner?
Not checked that yet, but I guess you'd need to rebuild everything for ARM
NFS is not used as whole rootfs-image is loaded directly into RAM. This image is given by LTSP-server which is separate for each user cluster
Understood... no NFS server but LTSP. Actually LTSP includes several services, e.g. DNS, TFTP, NFS, etc. And LTSP is deployed on the Admin cluster, too.
But LTSP requires a storage for the images that the clients are booting. Is this storage provided by every single control-plane nodes then? If yes, I would have n-1 replicas of the images for a cluster with n nodes. If no, this image storage would be a single-point-of-failure.
The root-fs image is build using Dockerfile, so rootfs-image for booting is part of LTSP-server image. Of course you can run it in multiple replicas
And what is your (original) software design to store this root-fs-image? Dedicated LTSP server or control-plane node storage?
@Kvasp have you got an docker compose of the services needed in order to get node up ? All that would be needed is the join command (for example of existing cluster), and nodes booted would join it? I tried decomposing the whole thing but the http booting with grub is not working for me..
It was intended that you can use standard tools like kubeadm and kube-spray to bootstrap Kubernetes cluster. All the needed components can be installed in HA inside it.
Would you have any idea, why would I get menu, then when It want's to load the vmlinuz from nginx (inside the docker) It fails. Nginx logs only show that it partialy downloaded the file, while curl works. (and to me it seams it timeouts) eg:
`192.168.42.133 - - [29/Apr/2022:13:07:10 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 13668608 "-" "curl/7.68.0"
192.168.42.151 - - [29/Apr/2022:13:10:27 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 155170 "-" "GRUB 2.04-1ubuntu44.2"
192.168.42.151 - - [29/Apr/2022:13:12:39 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 128906 "-" "GRUB 2.04-1ubuntu44.2"
192.168.42.151 - - [29/Apr/2022:13:18:18 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 117650 "-" "GRUB 2.04-1ubuntu44.2"
192.168.42.151 - - [29/Apr/2022:13:22:12 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 53866 "-" "GRUB 2.04-1ubuntu44.2"`
Unfortunately I have no idea. Do you use your own DHCP Server?
So what I did is I ran the docker twice: Once for dnsmasq-tftp + dnsmasq-dchp (with data from dhcp-controller) + images, and once for the nginx serving.
I took the configuration files from kubernetes deployment. (/etc/ltsp/ and /etc/dnsmasq.d)
docker was build from https://github.com/kvaps/kubefarm/blob/master/build/ltsp/Dockerfile.
dnsmasq-dhcp returns the ip:
dnsmasq-tftp: TFTP root is /srv/tftp single port mode
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-hosts/kubefarm-cluster1-cluster1-ltsp-clients
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-ip
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-options
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-tags
dnsmasq-dhcp: DHCPDISCOVER(ens33) 00:0c:29:73:12:15
dnsmasq-dhcp: DHCPOFFER(ens33) 192.168.42.151 00:0c:29:73:12:15
dnsmasq-dhcp: DHCPREQUEST(ens33) 192.168.42.151 00:0c:29:73:12:15
dnsmasq-dhcp: DHCPACK(ens33) 192.168.42.151 00:0c:29:73:12:15 moj2
then dnsmasq-tftp serves files
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/core.efi to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/core.efi to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/normal.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/extcmd.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/verifiers.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/crypto.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gettext.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/terminal.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gzio.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gcry_crc.mod to 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-01-00-0c-29-73-12-15 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A97 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A9 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A8 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C not found for 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/command.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/fs.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/crypto.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/terminal.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/grub.cfg to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/test.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/efi_gop.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/video_fb.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/video.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/efi_uga.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/cpuid.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/regexp.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/echo.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/linux.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/relocator.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/mmap.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/linuxefi.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/http.mod to 192.168.42.151
Menu opens and then the request comes to nginx where only partial vmlinuz is downloaded - but If I curl the url (which is also printed in the grub.cfg) I get it.
192.168.42.133 - - [29/Apr/2022:13:07:10 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 13668608 "-" "curl/7.68.0"
192.168.42.151 - - [29/Apr/2022:13:10:27 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 155170 "-" "GRUB 2.04-1ubuntu44.2"
If I just use the /etc/ltsp and create a default ltsp with nfs and dnsmasq - the node boot's up and joins the cluster. But the root is mounted via nfs ..
Ah got it. Did you run these comands to regenrate ltsp initrd image and grub config?
Yes i did. I now tried it on "real" server - not on virtual on my computer and it works .. So I guess all along there's something with my vmware setup that's causing this. I would really like to thank you for everything that you've done in this project - It's a great idea on how to provision nodes/clusters.
Hi, I want to rebuild your solution in my homelab. However I don't fully understand the architecture.
First, I would ask you to clarify the different terms:
For a PXE bootable server I would need
And what storage type is used? Is it local storage, means any control-plane node offers a NFS service? Or is it storage cluster, means all storage attached to the control-plane nodes is serving this cluster storage?
Regards Thomas