garutilorenzo / k3s-aws-terraform-cluster

Deploy an high available K3s cluster on Amazon AWS
GNU General Public License v3.0
150 stars 39 forks source link

The connection to the server localhost:8080 was refused on Master Node 2 & 3 #4

Closed muralidigi closed 2 years ago

muralidigi commented 2 years ago

Hi Lorenzo Gauruti,

Thank You sharing the source code for building k3s in AWS using Terraform: I followed the exact code & built 3 Master & 3 Worker nodes. I can only see one Master node 1: k3s failed on Master Node 2 & 3:

Any pointers or help will much appreciated:

Master Node1: root@i-04eb4456e0a425e92:~# kubectl get nodes NAME STATUS ROLES AGE VERSION i-04eb4456e0a425e92 Ready control-plane,etcd,master 66m v1.24.3+k3s1 root@i-04eb4456e0a425e92:~# ls -al /etc/rancher/k3s/k3s.yaml -rw------- 1 root root 2969 Aug 8 18:51 /etc/rancher/k3s/k3s.yaml

Master Node 2 & 3: root@i-04b860d791d7a9ebd:~# kubectl get nodes The connection to the server localhost:8080 was refused - did you specify the right host or port?

cat /var/log/cloud-init-output.log

Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service. [INFO] systemd: Starting k3s Job for k3s.service failed because the control process exited with error code. See "systemctl status k3s.service" and "journalctl -xeu k3s.service" for details. k3s did not install correctly

journalctl -xeu k3s: Aug 08 21:08:09 i-04b860d791d7a9ebd sh[40898]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.se> Aug 08 21:08:09 i-04b860d791d7a9ebd sh[40899]: Failed to get unit file state for nm-cloud-setup.service:> Aug 08 21:08:09 i-04b860d791d7a9ebd k3s[40902]: time="2022-08-08T21:08:09Z" level=info msg="Starting k3s>

garutilorenzo commented 2 years ago

Hi @muralidigi,

the issue is now resolved, i have created a new release with the fix. I was able to reproduce your error, after about 25 minutes of debug i seen the cluster fully online. The problem I think was an old ubuntu AMI, the apt-get upgrade takes a lot of time and this was generating some delay in the cluster init. Also I have chenged the hostname settings, now the hostname is defined directly in the launch template configuration instead of the bash script. With this changes the cluster will be up & running in few minutes. Please use an updated ami based on your region.