Closed GuillaumeDorschner closed 5 months ago
Hi Guillaume, the script is designed for a single master due to the need for a load balancer or control of internal DNS. It gets MUCH more complicated with HA. Here is a video I did on it : https://youtu.be/Um_GVIL71xQ
personally I recommend disabling firewalld.
When you run the worker piece it only runs a few things. https://github.com/clemenko/rke_airgap_install/blob/main/hauler_all_the_things.sh#L410
Does the service start? What does kubectl get node
show on the master?
Thank's for the quick answer !
The master does start, I did connect useing the kubeconfig:
➜ ansible git:(main) ✗ kubectl get nodes
NAME STATUS ROLES AGE VERSION
nuc1 Ready control-plane,etcd,master 176m v1.28.9+rke2r1
I'm see the logs could you help me ? I don't know where or what look for.
May 17 14:11:02 server3 sh[1997866]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
May 17 14:11:02 server3 sh[1997867]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=warning msg="not running in CIS mode"
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=info msg="Applying Pod Security Admission Configuration"
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=info msg="Starting rke2 v1.28.9+rke2r1 (07bf87f9118c1386fa73f660142cc28b5bef1886)"
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use t>
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=info msg="Managed etcd cluster not yet initialized"
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use t>
May 17 14:11:02 server3 rke2[1997873]: time="2024-05-17T14:11:02+02:00" level=fatal msg="starting kubernetes: preparing server: failed to validate server configuration: not authorized"
May 17 14:11:02 server3 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
May 17 14:11:02 server3 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- The unit rke2-server.service has entered the 'failed' state with result 'exit-code'.
May 17 14:11:02 server3 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
-- Subject: Unit rke2-server.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit rke2-server.service has failed.
--
-- The result is failed.
EDIT: I retry after a while and now I have this
[root@server3 ~]# curl -sfL http://192.168.x.x:8080/./hauler_all_the_things.sh | bash -s -- worker 192.168.x.x
- deploy worker
[info] updating kernel settings
[info] firewalld not installed
[info] installing base packages
[error] iptables container-selinux iptables libnetfilter_conntrack libnfnetlink libnftnl policycoreutils-python-utils cryptsetup iscsi-initiator-utils packages didn't install
[root@server3 ~]# yum install iptables container-selinux iptables libnetfilter_conntrack libnfnetlink libnftnl policycoreutils-python-utils cryptsetup iscsi-initiator-utils -y
Rancher RKE2 Common (stable) 0.0 B/s | 0 B 00:00
Errors during downloading metadata for repository 'rancher-rke2-common-stable':
- Curl error (6): Couldn't resolve host name for https://rpm.rancher.io/rke2/stable/common/centos/8/noarch/repodata/repomd.xml [Could not resolve host: rpm.rancher.io]
Error: Failed to download metadata for repo 'rancher-rke2-common-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Is server3 a worker node? Looks like you are running rke2-server
on it?
Yes, I did run curl -sfL http://192.168.x.x:8080/./hauler_all_the_things.sh | bash -s -- worker 192.168.x.x
Maybe I didn't uninstall all the way the server that was there before.
you can run rke2-uninstall.sh
on that node and re-run the curl command
It turns out the problem was due to not properly uninstalling the RKE/Rancher software I tried to use earlier. Now it’s working; it's a great.
If need
/usr/bin/rke2-killall.sh
/usr/bin/rke2-uninstall.sh
/usr/bin/rancher-system-agent-uninstall.sh
/usr/local/bin/k3s-uninstall.sh
/usr/local/bin/k3s-agent-uninstall.sh
/usr/local/bin/rancher-system-agent-uninstall.sh
yum remove rancher-system-agent
rm -rf /etc/rancher
rm -rf /var/lib/rancher
reboot
awesome!
Hello Clemenko,
before everything I want to tell you that this repo is amazing I was looking for something like this for a long time. I didn't know hauler. I need to setup a cluster offline (so air gap). I want to know if it's possible to have multiple masters ? We can do
curl -sfL http://192.168.x.x:8080/hauler_all_the_things.sh | bash -s -- worker 192.168.x.x
for the worker but I need 3 masters so how we do it? And I tried adding a worker but I got an error I think it's due to the config of a rke2 that I forgot to remove. So I removed it after but I think the script didn't actually rerun because I can see the message of the first run. or maybe it does but I don't. Also I'm getting stuck there after the[info] adding yum repo
:So how can I rerun the script to add a worker and how I can add multiple masters?