ebesson / ansible-role-k3s

Ansible Role - k3s Lightweight Kubernetes - https://galaxy.ansible.com/ebesson/k3s
Apache License 2.0
7 stars 2 forks source link

Cluster with only itself. #17

Open hvdkooij opened 3 years ago

hvdkooij commented 3 years ago

I used the playbook with an inventory of: ''' [k3s-master] node1

[k3s-agent] node2 node3'''

And I see node1 become the master. But node2 and node3 aren't doing anything it seems.

Any suggestions on how to troubleshoot?

hvdkooij commented 3 years ago

Can I adjust the playbook to deinstall nodes?

ebesson commented 3 years ago

By default a node is master, this configuration is done with parameter k3s_type: master => on your node2 and node3, in order to define these nodes as agent, you must set this parameter: k3s_type: agent. Here's as sample playbook :

---
- hosts: all
  vars:
    k3s_master_node_address: "{{ hostvars['master'].ansible_default_ipv4.address }}"
    k3s_cluster_token: "{{ hostvars['master']['k3s_cluster_token'] }}"

- hosts: k3s-master
  become: True
  vars:
    k3s_type: master
  roles:
    - ebesson.k3s

- hosts: k3s-agent
  become: True
  vars:
    k3s_type: agent
  roles:
    - ebesson.k3s

I've published an full example with vagrant : https://github.com/ebesson/ansible-k3s-playbook/

ebesson commented 3 years ago

At this time, you're not able to deinstall a node,I've create an issue #19 , contributions are welcome ;-)

hvdkooij commented 3 years ago

I have changed the playbook and rerun it but it seems there is no chnage to the setup. I did notice other items in the logs now: Dec 12 20:36:01 node3 k3s[26443]: time="2020-12-12T20:36:01.322483105+01:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:44692/cacerts\": read tcp 127.0.0.1:48806->127.0.0.1:44692: read: connection reset by peer"

So there is more to it to get this working. Unforunatly this is a fight where I am neither an Ansible nor and K3S expert.

This is a test on aclean Centos 7 cluster and I was hoping it would work without too much headaches ;-)

hvdkooij commented 3 years ago

When look at the output of k3s check-config on the master then there is a fail: (RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line) (fail) It is listed as optional feature and should not cause a fail. When run on both node 1 (master) and node 2 (agent) neither show anything else wrong in that check.

Digging around I noticed a test to get to port 6443 on all machines. And only the master could connect to itself with curl -ks https://node1:6443/static/charts/. A quick workaround was to shutdown the firewall with systemctl stop firewald. While that is acceptable on a isolated lab it is an issue on a more real life setup.

Getting the proper firewall rules in there might be a good thing to investigate. I guess using https://docs.ansible.com/ansible/latest/collections/ansible/posix/firewalld_module.html seems a good option.